iiit hyderabad document image retrieval using bag of visual words model ravi shekhar cvit, iiit...

109
IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

Upload: harry-chandler

Post on 04-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Document Image Retrieval using Bag of Visual Words Model

Ravi ShekharCVIT, IIIT Hyderabad

Advisor : Prof. C.V. Jawahar

Page 2: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Motivation• Large number of printed books are digitized

Page 3: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Motivation• Large number of printed books are digitized

• Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc.

Digital Library Database

Page 4: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Motivation• Large number of printed books are digitized

• Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc.

• Need to design efficient and effective methodology for content level access

Digital Library Database

Page 5: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Process Overview

IndexDatabase

Documents

Processing Input Query

Matching

Retrieved Documents

Scanning

Matching can be done by two levels : “Text” and “Image”

Page 6: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Matching Approaches

• Recognition Based Approach (Text Level Matching)• Optical Character Recognition (OCR)

• Recognition Free Approach (Image Level Matching)• Word Spotting

Page 7: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Recognition Based Approach

• Optical Character Recognition (OCR)• Binarization of Document• Segmentation using connected components

• Line level• Word level• Character level

• Character recognition using different features like patch, profile etc• Classification using ANN or SVM

Page 8: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Limitations of Recognition Based Approach

• Cuts

Page 9: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Limitations of Recognition Based Approach

• Cuts• Merges

Page 10: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Limitations of Recognition Based Approach

• Cuts• Merges• Variation in Script

Page 11: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Limitations of Recognition Based Approach

• Cuts• Merges• Variation in Script• Variation in Font and Typesetting

Page 12: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Limitations of Recognition Based Approach

• Cuts• Merges• Variation in Script• Variation in Font and Typesetting• Underline and Over Written

Page 13: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Recognition Free Approach

• Word Spotting• Representation of word image using global (profile) features

Page 14: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Recognition Free Approach

• Word Spotting• Representation of word image using global (profile) features• Matching features using different distance measures like L1, L2 etc

Page 15: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Recognition Free Approach

• Word Spotting• Representation of word image using global (profile) features• Matching features using different distance measures like L1, L2 etc• Comparison of different size word images using Dynamic time warping

(DTW)

Page 16: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Why Recognition Free Approach ?

• Robust OCRs are unavailable for many non-Latin languages• These languages have rich heritage and there is a need for

content level search• Word Spotting based methods are too slow for real time system• Most of the existing retrieval methods are memory intensive• Scalability is an immediate challenge

Page 17: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Word Image Retrieval using Bag of Visual Words

Page 18: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Bag of Visual Words (BoVW)

• Bag of Words (BoW) representation is the most popular representation for text retrieval

• BoW based efficient systems like Lucene are publically available• Bag of Visual Words (BoVW) performs excellently for image and

video retrieval• BoVW based system is flexible, powerful and scalable to Billions

of images

Page 19: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

BoVW Representation

• Word Images are represented using Histogram of Visual Words

Page 20: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

BoVW Representation

• Code Book generation• Subset of Images is used• Clustering is done using Hierarchical K-Means (HKM)• HKM is faster than K-Means both in building tree and finding nearest

neighbours

Page 21: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

BoVW based Representation

Page 22: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

BoVW based Representation

Page 23: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Histogram of Visual Words

BoVW based Representation

Page 24: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

BoVW based Representation

Cuts

Page 25: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Histogram of Visual Words

BoVW based Representation

Cuts

Page 26: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

BoVW based Representation

Merges

Page 27: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Histogram of Visual Words

BoVW based Representation

Merges

Page 28: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Proposed Architecture

Page 29: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Fixed size representation

Advantages of BoVW based Representation

Page 30: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Fixed size representation

Advantages of BoVW based Representation

Clean

Clean

Page 31: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Fixed size representation• Robust against degradation

Advantages of BoVW based Representation

Page 32: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Fixed size representation• Robust against degradation

Advantages of BoVW based Representation

Cuts MergeClean

Page 33: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Fixed size representation• Robust against degradation• Scalable to Billions of images

Advantage of BoVW based Representation

Page 34: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Fixed size representation• Robust against degradation• Scalable to Billions of Images• Language independent

Advantages of BoVW based Representation

Page 35: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Lost Geometry

Spatial Verification

Page 36: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Lost Geometry

Spatial Verification

Clean

Clean

Page 37: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Lost Geometry

Spatial Verification

Clean

Clean

Clean

Page 38: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Lost Geometry

Spatial Verification

Clean

Clean

Clean

Page 39: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Lost Geometry• Spatial Verification

Spatial Verification

Page 40: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Lost Geometry• Spatial Verification

Spatial Verification

Page 41: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

• Lost Geometry• Spatial Verification

Spatial Verification

Page 42: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Re-ranking

• SIFT based re-ranking• Higher the Total Score, better the match

j I # SIFT iniI# SIFT in

nts#Match Poi

jI

iIScore ),(

image theofpart for Score : ) ,(

image entirefor Score : ) ,( where,

) ,(3

1) ,() ,(

kthI kjI k

iScore

jIiI Score

I kjI k

ik

Score j

Ii

I Scorej

Ii

I ScoreTotal3

1

Page 43: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Experimentations

Books Used in Experimentations

Language #Books #Pages #Words

Hindi 4 427 112677

Malayalam 6 610 108767

Telugu 5 742 131156

Bangla 3 363 124584

Hindi 32 3992 1008138

Page 44: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Quantitative Results

Performance Statistics

Language #Images #Query mAPmAP

after Re-ranking

mAP after Spatial

Verification

Hindi 112677 138 0.6808 0.7820 0.7865

Malayalam 108767 101 0.6962 0.7991 0.8188

Telugu 131156 131 0.6483 0.7328 0.7495

Bangla 124584 125 0.7806 0.8766 0.8947

Hindi 1008138 138 0.5895 0.7022 0.7062

Page 45: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Quantitative Results

Performance Statistics

Language #Images #Query Prec@10Prec@10

after Re-ranking

Prec@10 after Spatial Verification

Hindi 112677 138 0.8437 0.8719 0.8770

Malayalam 108767 101 0.7668 0.8328 0.8581

Telugu 131156 131 0.8507 0.8668 0.883

Bangla 124584 125 0.8498 0.9022 0.9182

Hindi 1008138 138 0.8059 0.8509 0.8543

Page 46: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Quantitative Results

• mAP Vs Query Length

Page 47: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Quantitative Results

• mAP Vs Query Length• More the # characters, better the results

Page 48: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Quantitative Results

Retrieval Time and Index Size

#Images Retrieval Time Index Size

25K 50ms 28 MB

100K 209ms 130 MB

0.5M 411ms 550 MB

1M 700ms 1.2 GB

Page 49: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Qualitative Results

Query Retrieved Results

HI

Page 50: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Qualitative Results

Query Retrieved Results

Page 51: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Qualitative Results

Query Retrieved Results

Page 52: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Qualitative Results

Query Retrieved Results

Page 53: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Qualitative Results

• Sample Output for Noisy Images where Commercial OCR fails

Query Retrieved Results

Page 54: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Enhancement over Bag of Visual Words based Word Image Retrieval

Page 55: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Query Expansion

• Observation: Top ranked results are correct• Top-k results are used to form new query• Improves the precision of retrieved list• Modified average query expansion

─ Instead of equal weight to every Top-k results, rank based weight (1/2rank) is given

• Improves mAP and Prec@10 by 2%

Page 56: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Query Expansion

Query Image

Index

Histogram

Querying

Refined Histogram

Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6

Query ImageRank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6

Query Histogram

Page 57: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Query Expansion

Query Image

Index

Expanded Query Histogram

Querying

Previous Results

Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6

Modified Results

Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6

Page 58: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Text Query Support

• Originally formulated in a “query by example” setting but users would prefer textual interface for document image collection

• We propose a novel and simple framework for text query support• Used a small subset of data with ground truth covering all possible

characters in a particular language• Visual words are learnt specific to each character and averaged across its

different variations• Given a textual query, we synthesize its BoVW histogram

• Text query results are comparable to word image results

Page 59: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Text Query Support

• Query by example setting

Input Query Image Histogram

Page 60: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Text Query Support

• Query by example setting• Text Queries Support

Input Text Query

Text Query Histogram

Page 61: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Qualitative Results

Sample output for queries using different techniques

Page 62: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

Page 63: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

Codebook :

Code :

Descriptor : where,

,0,1||||,1||||..

||||minarg

10

1

2

B

c

x

icccts

Bcx

i

i

ilili

N

iii

C

Page 64: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

Page 65: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

(a)

Input Descriptor

Page 66: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

• Problems with VQ

Page 67: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

• Problems with VQ• Visual word uncertainty

Page 68: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

• Problems with VQ• Visual word uncertainty

• Mapping single VW from out of 2 or more possible

Page 69: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Vector Quantization

• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment

• Problems with VQ• Visual word uncertainty

• Mapping single VW from out of 2 or more possible

Page 70: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Vector Quantization

• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment

• Problems with VQ• Visual word uncertainty• Visual word plausibility

Page 71: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Vector Quantization

• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment

• Problems with VQ• Visual word uncertainty• Visual word plausibility

• Mapping a visual word without a suitable candidate in the vocabulary

Page 72: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Vector Quantization

• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment

• Problems with VQ• Visual word uncertainty• Visual word plausibility

• Mapping a visual word without a suitable candidate in the vocabulary.

Page 73: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Vector Quantization

• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment

• Problems with VQ• Visual word uncertainty• Visual word plausibility

• Solution: Soft Assignment• Map each feature vector to 2 or more possible VW

Page 74: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Soft Assignment

• Map each feature vector to 2 or more possible VW• Approached of Soft Assignment

• Distance based • Equal weight• Based on Distance in Feature Space• Gaussian Distance• Does not minimize reconstruction error

Page 75: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Soft Assignment

• Map each feature vector to 2 or more possible VW• Approached of Soft Assignment

• Distance based • Equal weight• Based on Distance in Feature Space• Gaussian Distance• Does not minimize reconstruction error Input

Descriptor

Page 76: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Soft Assignment

• Map each feature vector to 2 or more possible VW• Approached of Soft Assignment

• Distance based • Equal weight• Based on Distance in Feature Space• Gaussian Distance• Does not minimize reconstruction error

• Through learning optimal reconstruction

Page 77: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Locality-constrained Linear Coding (LLC)

• Similar patch should have similar code• Locality of Visual Word is used to describe feature vector

Page 78: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Locality-constrained Linear Coding (LLC)

• Similar patch should have similar code• Locality of Visual Word is used to describe feature vector

)B),dist(xexp(

,11..

||||||||minarg

i

andtion multiplica wise-element is ,

2

1

2

i

where

iT

ii

N

iii

C

d

icts

cdBcx

Page 79: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Locality-constrained Linear Coding (LLC)

• Similar patch should have similar code• Locality of Visual Word is used to describe feature vector• LLC Coding Process

• Find K – Nearest Neighbors of xi denoted as B

• Reconstruct xi using B

• Replace input xi with non-zero code obtained from previous step Input

Descriptor

Page 80: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Re-ranking

• SIFT based re-ranking1

• Longest common sub-sequence (LCS) based re-ranking2

• Size of LCS of visual words projected on x-axis• Larger the size, better the match

1. Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS 20122. Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012

V1

V2

V6

V4

V4

V8

V9

x

y

0.5

0

1

0.5 1 1.5 2 2.5 3

Page 81: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Re-ranking

• SIFT based re-ranking1

• Longest common sub-sequence (LCS) based re-ranking2

• Size of LCS of visual words projected on X-axis• Larger the size, better the match

• Linear Combination2Final Score = λ * Index_Score + (1-λ) * Re-ranking _Score where λ weighting

parameter

1. Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS 20122. Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012

Page 82: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Dataset Used

Books Used For The Experiments

Book #Pages #Words

Telugu- 1716 120 4121

Telugu- 1718 100 21345

English-1601 363 113008

Page 83: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Quantitative Results

LLC Based Statistics (mAP)

Book BoVWBoVW +

SIFT Re-ranking

BoVW + LCS

Re-rankingLLC

LLC + LCS Re-raking

Telugu-1716 0.8173 0.8645 0.9036 0.91 0.95

Telugu-1718 0.7834 0.8861 0.918 0.92 0.96

English-1601 0.8015 0.8531 0.92 0.8765 0.9451

Page 84: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Quantitative Results

Text Query Based Statistics

Book Method mAP

Telugu- 1716 Text Query 0.8413

Telugu- 1718 Text Query 0.90

English-1601 Text Query 0.87

Page 85: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

Page 86: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch

Page 87: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features

Page 88: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

Page 89: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile

Page 90: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Measures ink distribution of word image

Page 91: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition

• Measures internal shape of image

Page 92: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition

• Measures internal shape of image

Page 93: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition• Upper Word Profile

Page 94: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition• Upper Word Profile

• Distance from Upper Boundary of word image

Page 95: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition• Upper Word Profile

• Distance from Upper Boundary of word image

Page 96: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition• Upper Word Profile• Lower Word Profile

Page 97: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition• Upper Word Profile• Lower Word Profile

• Distance from Lower Boundary of word image

Page 98: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition• Upper Word Profile• Lower Word Profile

• Distance from Lower Boundary of word image

Page 99: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Overview of Feature Calculation

. . .

Calculate 4 profile features

Concatenate 4 profile features

Projection profile

Lower word profile

Ink Transition

Upper word profile

Input word image

Descriptor

Page 100: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Fast Pre-Processing

. . .

. . .

. . .

. . .

.

.

.

. . .

V1

V2

V3

.

.

.

Vk

InputPatch

Corresponding Patch Vector

Lookup Table

Is patch Vector

Present ?

Find corresponding

Visual WordRetrieve corresponding Visual

Word

Yes

No

Update

Page 101: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Dataset Used

Book #Pages #Words

Telugu- 1718 100 21345

English-1601 363 113008

Page 102: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Quantitative Results

Baseline Statistics

Book Method mAP

Telugu- 1718 SIFT 0.7834

Telugu- 1718 Patch 0.53

Telugu- 1718 Patch Feature 0.6183

Telugu- 1718 Patch Feature with Overlap 0.7214

Page 103: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Quantitative Results

Enhancement on Baseline Statistics

Enhancement Method SIFT Patch Feature

Query Expansion 0.7920 0.75

Spatial Verification 0.8571 0.83

LCS Re-ranking 0.8798 0.8481

Page 104: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Quantitative Results

Results with Split Features

Book SIFT Patch Feature

Telugu -1718 0.94 0.954

English – 1601 0.93 0.90

Page 105: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Qualitative Results

Page 106: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Contributions

• Language Independent System• Tested on 4 different languages

• Scalable to huge dataset • Tested on 1 Millions of word Images

• Handles Noisy document images• Demonstrated performance on dataset where commercial OCR fails.

• Enhancement on baseline results• Query Expansion • Text Query Support• Document specific Sparse coding

• Document Specific descriptor is proposed

Page 107: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Future Work

• Test on different font dataset• Similar method for handwritten, camera based datasets• Learning character level visual word automatically using

annotated data• Multi Keyword support• Combine both recognition based and recognition free

methods• Improve patch based descriptor.

Page 108: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Related Publications

• Ravi Shekhar and C. V. Jawahar , “Word Image Retrieval using Bag of Visual Words”, In Proceedings of 10th IAPR International Workshop on Document Analysis Systems (DAS), 2012.

• Praveen Krishnan, Ravi Shekhar and C. V. Jawahar, “Content Level Access to Digital Library of India Pages”, In Proceedings of 8th Indian Conference on Vision, Graphics and Image Processing (ICVGIP), 2012.

• Ravi Shekhar and C. V. Jawahar, “Document Specific Sparse Coding for Word Retrieval”, In Proceedings of 12th International Conference on Document Analysis and Recognition (ICDAR), 2013.

Page 109: IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT H

yderabad

Thanks !!!