final year major project report ( year 2010-2014 batch )

95
SELECTION BY CHOICE SUBMITTED BY: JAIVISH KOTHARI PARUL JINDAL KRITKA TIWARY

Upload: parul6792

Post on 22-Dec-2014

305 views

Category:

Engineering


3 download

DESCRIPTION

Final Year Major Project Presentation Selection By choice interface

TRANSCRIPT

SELECTION BY CHOICE

SUBMITTED BY:

JAIVISH KOTHARI

PARUL JINDAL

KRITKA TIWARY

2

Outline

– Introduction• Why image retrieval is hard• How images are represented• Current approaches

– Indexing and Retrieving Images• Navigational approaches• Relevance Feedback• Automatic Keywording

– Advanced Topics, Futures and Conclusion• Video and music retrieval• Towards practical systems• Conclusions and Feedback

3

Scope

General Digital Still Photographic Image Retrieval– Generally colour

Some different issues arise– Narrower domains

• E.g.Medical images especially where part of body and/or specific disorder is suspected

– Video– Image Understanding - object recognition

4

Thanks to

Chih-Fong Tsai Sharon McDonald Ken McGarry Simon Farrand And members of the University of

Sunderland Information Retrieval Group

IntroductionIntroduction

6

Why is Image Retrieval Hard ?Why is Image Retrieval Hard ?

? What is the topic of What is the topic of this imagethis image

? What are right What are right keywords to index keywords to index this imagethis image

? What words would What words would you use to retrieve you use to retrieve this image ?this image ?

The Semantic GapThe Semantic Gap

7

Problems with Image RetrievalProblems with Image Retrieval

A picture is worth a thousand wordsA picture is worth a thousand words The meaning of an image is highly The meaning of an image is highly

individual and subjectiveindividual and subjective

8

How similar are these two How similar are these two imagesimages

How Images are represented

10

11

12

Compression

• In practice images are stored as compressed raster– Jpeg– Mpeg

• Cf Vector …

• Not Relevant to retrieval

13

Image Processing for Retrieval

• Representing the Images– Segmentation – Low Level Features

• Colour• Texture• Shape

14

Image Features

• Information about colour or texture or shape which are extracted from an image are known as image features– Also a low-level features

• Red, sandy

– As opposed to high level features or concepts• Beaches, mountains, happy, serene, George Bush

15

Image Segmentation

• Do we consider the whole image or just part ?– Whole image - global features– Parts of image - local features

16

Global features• Averages across whole image Tends to loose distinction between foreground

and background Poorly reflects human understanding of images Computationally simple A number of successful systems have been built

using global image features including Sunderland’s CHROMA

17

Local Features

• Segment images into parts

• Two sorts:– Tile Based– Region based

18

Regioning and Tiling Schemes

(a) 5 tiles (b) 9 tiles

(c) 5 regions (d) 9 regions

Tiles

Regions

19

Tiling Break image down into simple geometric

shapes Similar Problems to Global

Plus dangers of breaking up significant objects

Computational Simple Some Schemes seem to work well in practice

20

Regioning

• Break Image down into visually coherent areas

Can identify meaningful areas and objects

Computationally intensive Unreliable

21

Colour

• Produce a colour signature for region/whole image

• Typically done using colour correllograms or colour histograms

22

Colour HistogramsIdentify a number of buckets in which to sort the available colours (e.g. red green and blue, or up to ten or so colours) Allocate each pixel in an image to a bucket and count the number of pixels in each bucket.Use the figure produced (bucket id plus count, normalised for image size and resolution) as the index key (signature) for each image.

23

Global Colour Histogram

0

10

20

30

40

50

60

70

80

90

Red Orange

24

Other Colour Issues

• Many Colour Models– RGB (red green blue)– HSV (Hue Saturation Value)– Lab, etc. etc.

• Problem is getting something like human vision– Individual differences

25

Texture

• Produce a mathematical characterisation of a repeating pattern in the image– Smooth– Sandy– Grainy– Stripey

26

27

28

Texture

• Reduces an area/region to a (small - 15 ?) set of numbers which can be used a signature for that region.

• Proven to work weel in practice

• Hard for people to understand

29

Shape

• Straying into the realms of object recognition

• Difficult and Less Commonly used

30

Ducks again

• All objects have closed boundaries

• Shape interacts in a rather vicious way with segmentation

• Find the duck shapes

31

32

Summary of Image Representation

• Pixels and Raster• Image Segmentation

– Tiles

– Regions

• Low-level Image Features– Colour

– Texture

– Shape

Indexing and Retrieving Indexing and Retrieving ImagesImages

3434

Overview of Section 2Overview of Section 2

Quick Reprise on IRQuick Reprise on IR Navigational ApproachesNavigational Approaches Relevance FeedbackRelevance Feedback Automatic Keyword AnnotationAutomatic Keyword Annotation

3535

Reprise on Key Interactive IR Reprise on Key Interactive IR ideasideas

Index Time vs Query Time ProcessingIndex Time vs Query Time Processing Query TimeQuery Time

Must be fast enough to be interactiveMust be fast enough to be interactive Index (Crawl) TimeIndex (Crawl) Time

Can be slow(ish) Can be slow(ish) There to support retrievalThere to support retrieval

3636

An IndexAn Index

A data structure which stores data in a suitably A data structure which stores data in a suitably abstracted and compressed form in order to abstracted and compressed form in order to faciliate rapid processing by an applicationfaciliate rapid processing by an application

3737

Indexing ProcessIndexing Process

Navigational Approaches Navigational Approaches to Image Retrievalto Image Retrieval

3939

Essential IdeaEssential Idea

Layout images in a virtual space in an Layout images in a virtual space in an arrangement which will make some sense to arrangement which will make some sense to the userthe user

Project this onto the screen in a Project this onto the screen in a comprehensible formcomprehensible form

Allow them to navigate around this projected Allow them to navigate around this projected space (scrolling, zooming in and out)space (scrolling, zooming in and out)

4040

NotesNotes

Typically colour is usedTypically colour is used Texture has proved difficult for people to Texture has proved difficult for people to

understandunderstand Shape possibly the same, and also user interface - Shape possibly the same, and also user interface -

most people can’t draw !most people can’t draw ! Alternatives include time (Canon’s Time Alternatives include time (Canon’s Time

Tunnel) and recently location (GPS Cameras)Tunnel) and recently location (GPS Cameras) Need some means of knowing where you areNeed some means of knowing where you are

4141

ObservationObservation

It appears people can take in and will inspect It appears people can take in and will inspect many more images than texts when searcingmany more images than texts when searcing

4242

CHROMACHROMA

Development in Sunderland: Development in Sunderland: mainly by Ting Sheng Lai now of National Palace mainly by Ting Sheng Lai now of National Palace

Museum, Taipei, TaiwanMuseum, Taipei, Taiwan Structure Navigation SystemStructure Navigation System

Thumbnail ViewerThumbnail Viewer

Similarity SearchingSimilarity Searching

Sketch ToolSketch Tool

4343

The CHROMA SystemThe CHROMA System

General Photographic ImagesGeneral Photographic Images

Global Colour is the Primary Indexing KeyGlobal Colour is the Primary Indexing Key

Images organised in a hierarchical Images organised in a hierarchical

classification using 10 colour descriptorsclassification using 10 colour descriptors and and

colour histogramscolour histograms

4444

Access SystemAccess System

4545

The Navigation ToolThe Navigation Tool

4646

Technical IssuesTechnical Issues

Fairly Easy to arrange image signatures so Fairly Easy to arrange image signatures so they support rapid browsing in this spacethey support rapid browsing in this space

Relevance FeedbackRelevance Feedback

More Like thisMore Like this

4848

Relevance FeedbackRelevance Feedback

Well established technique in text retrievalWell established technique in text retrieval Experimental results have always shown it to work Experimental results have always shown it to work

well in practicewell in practice Unfortunately experience with search engines Unfortunately experience with search engines

has show it is difficult to get real searchers to has show it is difficult to get real searchers to adopt it - too much interactionadopt it - too much interaction

4949

Essential IdeaEssential Idea

User performs an initial queryUser performs an initial query Selects some relevant resultsSelects some relevant results System then extracts terms from these to System then extracts terms from these to

augment the initial query augment the initial query RequeriesRequeries

5050

Many VariantsMany Variants

PseudoPseudo Just assume high ranked documents are relevantJust assume high ranked documents are relevant

Ask users about terms to useAsk users about terms to use Include negative evidenceInclude negative evidence Etc. etc.Etc. etc.

5151

Query-by-Image-ExampleQuery-by-Image-Example

5252

Why useful in Image Retrieval?Why useful in Image Retrieval?

1.1. Provides a bridge between the users Provides a bridge between the users understanding of images and the low level understanding of images and the low level features (colour, texture etc.) with which the features (colour, texture etc.) with which the systems is actually operatingsystems is actually operating

2.2. Is relatively easy to interface to Is relatively easy to interface to

5353

Image Retrieval ProcessImage Retrieval Process

Ducks

Green

Water Texture

Leaf Texture

5454

ObservationsObservations

Most image searchers prefer to use key words Most image searchers prefer to use key words to formulate initial queriesto formulate initial queries Eakins et al, Enser et alEakins et al, Enser et al

First generation systems all operated using low First generation systems all operated using low level features onlylevel features only Colour, texture, shape etc.Colour, texture, shape etc. Smeulders et alSmeulders et al

5555

Ideal Image Retrieval ProcessIdeal Image Retrieval Process

Thumbnail Browsing

Need KeywordQuery

More Like this

5656

Image Retrieval as Text RetrievalImage Retrieval as Text Retrieval

What we really want to do is make the image What we really want to do is make the image retrieval problem text retrievalretrieval problem text retrieval

5757

Three Ways to goThree Ways to go

Manually Assign Keywords to each imageManually Assign Keywords to each image Use text associated with the images (captions, Use text associated with the images (captions,

web pages)web pages) Analyse the image content to automatically Analyse the image content to automatically

assign keywordsassign keywords

5858

Manual KeywordingManual Keywording

ExpensiveExpensive Can only really be justified for high value Can only really be justified for high value

collections – advertisingcollections – advertising UnreliableUnreliable

Do the indexers and searchers see the images in Do the indexers and searchers see the images in the same waythe same way

FeasibleFeasible

5959

Associated TextAssociated Text

CheapCheap PowerfulPowerful

Famous names/incidentsFamous names/incidents Tends to be “one dimensional”Tends to be “one dimensional”

Does not reflect the content rich nature of imagesDoes not reflect the content rich nature of images Currently Operational - GoogleCurrently Operational - Google

6060

Possible SourcesPossible Sourcesof Associated textof Associated text

FilenamesFilenames Anchor TextAnchor Text Web Page Text around the anchor/where the Web Page Text around the anchor/where the

image is embeddedimage is embedded

6161

Automatic Keyword AssignmentAutomatic Keyword Assignment

A form of Content Based Image RetrievalA form of Content Based Image Retrieval

Cheap (ish)Cheap (ish) Predictable (if not always “right”)Predictable (if not always “right”) No operational System DemonstratedNo operational System Demonstrated

Although considerable progress has been made Although considerable progress has been made recentlyrecently

6262

Basic ApproachBasic Approach

Learn a mapping from the low level image Learn a mapping from the low level image features to the words or concepts features to the words or concepts

6363

Two RoutesTwo Routes

1.1. Translate the image into piece of textTranslate the image into piece of textn Forsyth and other sForsyth and other sn Manmatha and othersManmatha and others

2.2. Find that category of images to which a Find that category of images to which a keyword applieskeyword applies

n Tsai and TaitTsai and Taitn (SIGIR 2005)(SIGIR 2005)

6464

Second Session SummarySecond Session Summary

Separating Index Time and Retrieval Time Separating Index Time and Retrieval Time OperationsOperations

““First generation CBIR”First generation CBIR” Navigation (by colour etc.)Navigation (by colour etc.) Relevance FeedbackRelevance Feedback

Keyword based RetrievalKeyword based Retrieval Manual IndexingManual Indexing Associated TextAssociated Text Automatic Keywording Automatic Keywording

Advanced Topics, Futures and Conclusions

66

Outline

Video and Music Retrieval Towards Practical Systems Conclusions and Feedback

Video and Music Retrieval

68

Video Retrieval

• All current Systems are based on one or more of:– Narrow domain - news, sport– Use automatic speech recognition to do

speech to text on the soundtrack– Do key frame extraction and then treat the

problem as still image retrieval

69

Missing Opportunities in Video Retrieval

• Using delta’s - frame to frame differences - to segment the image into foreground/background, players, pitch, crowd etc.

• Trying to relate image data to language/text data

70

Music Retrieval

• Distinctive and Hard Problem– What makes one piece of music similar to

another

• Features– Melody– Artist– Genre ?

Towards Practical Systems

72

Ideal Image Retrieval ProcessThumbnail Browsing

Need KeywordQuery

More Like this

73

Requirements

> 5000 Key word vocabulary > 5% accuracy of keyword assignment for all

keywords> 5% precision in response to single key word queries

The Semantic Gap Bridged!

74

CLAIRE

Example State of the Art Semantic CBIR System Colour and Texture Features Simple Tiling Scheme Two Stage Learning Machine

SVM/SVM and SVM/k-NNColour to 10 basic coloursTexture to one texture term per category

75

Tiling Scheme

76

TextureClassifier

TextureClassifier

Architecture of ClaireIm

age

Seg

men

tati

on Col

our

Tex

ture K

ey w

ord

Ann

otat

ion

DataExtractor

DataExtractor

Known Key Word/classKnown Key Word/class

77

Training/Test Collection

Randomly Selected from Corel Training Set

30 images per category Test Collection

20 images per category

78

SVM/SVM Keywording with 100+50 Categories

0%

10%

20%

30%

40%

50%

60%

70%

10 30 50 70 100

concrete classesabstract classesbaseline

79

Examples Keywords Concrete

Beaches Dogs Mountain Orchids Owls Rodeo Tulips Women

Abstract Architecture City Christmas Industry Sacred Sunsets Tropical Yuletide

80

SVM vs kNN

0%

10%

20%

30%

40%

50%

60%

70%

10 30 50 70 100 150

SVM concreteSVM abstractbaselinekNN abstractkNN concrete

81

Reduction in Unreachable Classes

0

10

20

30

40

50

60

10 30 50 70 100 150

Missing Category Numbers

SVM concreteSVM abstractkNN concretekNN abstract

82

Labelling Areas of Feature Space

Mountain

Sea

Tree

83

Overlap in Feature Space

84

Keywording 200+200 Categories

SVM/1-NN

0%

10%

20%

30%

40%

50%

60%

10 30 50 70 100 150 200

concrete keywords

abstract keywords

baseline

Expon. (abstractkeywords)

85

Discussion Results still promising 5.6% of images have at least one

relevant keyword assigned Still useful - but only for a vocabulary of 400 words !

See demo at http://osiris.sunderland.ac.uk/~da2wli/system/silk1/ High proportion of categories which are never assigned

86

Segmentation

Are the results dependent on the specific tiling/regioning scheme used ?

87

Regioning

(a) 5 tiles (b) 9 tiles

(c) 5 regions (d) 9 regions

88

Effectiveness Comparison

8% (81)13.7% (25)

16.7% (15)

21.43% (5)

27.9% (2)

36.67% (0)

52.5% (0)

14.3% (30)

9.13% (69)

27.79% (1)

33% (0)

40.67% (1)

61.5% (0)

18.4% (9)

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

10 30 50 70 100 150 200

No. of concrete classes

Ac

cu

rac

y

tiles

regions

8% (81)8.86% (44)

11.25% (21)

16.29% (7)

22.55% (2)26.33% (2)

52.5% (0)

9% (51) 9.13% (69)

9.25% (27)

14.14% (7)

31% (0)

21.06% (0)

48% (0)

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

10 30 50 70 100 150 200

No. of abstract classes

Ac

cu

rac

y

tiles

regionsFive Tiles vs Five Regions1-NN Data Extractor

89

Next Steps

More categories Integration into complete systems Systematic Comparison with Generative approach

pioneered by Forsyth and others

90

Other Promising Examples Jeon, Manmatha and others -

High number of categories - results difficult to interpret Carneiro and Vasconcelos

Also problems with missing concepts Srikanth et al

Possibly leading results in terms of precision and vocabulary scale

91

Conclusions

Image Indexing and Retrieval is Hard Effective Image Retrieval needs a cheap and

predictable way of relating words and images Adaptive and Machine Learning approaches offer

one way forward with much promise

Feedback

Comments and Questions

Selected BibliographySelected Bibliography

94

Early SystemsEarly SystemsThe following leads into all the major trends in systems based on colour, The following leads into all the major trends in systems based on colour,

texture and shapetexture and shape A. Smeaulder, M. Worring, S. Santini, A. Gupta and R. Jain “Content-based Image Retrieval: the end A. Smeaulder, M. Worring, S. Santini, A. Gupta and R. Jain “Content-based Image Retrieval: the end

of the early years” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349-of the early years” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349-1380, 2000.1380, 2000.

CHROMACHROMA Sharon McDonald and John TaitSharon McDonald and John Tait “Search Strategies in Content-Based Image Retrieval” Proceedings “Search Strategies in Content-Based Image Retrieval” Proceedings

of the 26of the 26thth ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, July, 2003. pp 80-87. ISBN 1-58113-646-32003), Toronto, July, 2003. pp 80-87. ISBN 1-58113-646-3

Sharon McDonald, Ting-Sheng Lai and Sharon McDonald, Ting-Sheng Lai and John Tait, John Tait, “Evaluating a Content Based Image Retrieval “Evaluating a Content Based Image Retrieval System” Proceedings of the 24System” Proceedings of the 24 thth ACM SIGIR Conference on Research and Development in ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), New Orleans, September 2001. W.B. Croft, D.J. Harper, D.H. Information Retrieval (SIGIR 2001), New Orleans, September 2001. W.B. Croft, D.J. Harper, D.H. Kraft, and J. Zobel (Eds). ISBN 1-58113-331-6 pp 232-240Kraft, and J. Zobel (Eds). ISBN 1-58113-331-6 pp 232-240 ..

Translation Based ApproachesTranslation Based Approaches P. Duygulu, K. Barnard, N. de Freitas and D. Forsyth “Learning a Lexicon for a Fixed Image P. Duygulu, K. Barnard, N. de Freitas and D. Forsyth “Learning a Lexicon for a Fixed Image

Vocabulary” European Conference on Computer Vision, 2002.Vocabulary” European Conference on Computer Vision, 2002. K. Barnard, P. Duygulu, N. de Freitas and D. Forsyth “Matching Words and Pictures” Journal of K. Barnard, P. Duygulu, N. de Freitas and D. Forsyth “Matching Words and Pictures” Journal of

machine Learning Research 3: 1107-1135, 2003.machine Learning Research 3: 1107-1135, 2003.

Very recent new paper on this is:Very recent new paper on this is: P. Virga, P. Duygulu “Systematic Evaluation of Machine Translation Methods for Image and Video P. Virga, P. Duygulu “Systematic Evaluation of Machine Translation Methods for Image and Video

Annotation” Images and Video Retrieval, Proceedings of CIVR 2005, Singapore, Springer, 2005.Annotation” Images and Video Retrieval, Proceedings of CIVR 2005, Singapore, Springer, 2005.

95

Cross-media Relevance Models etcCross-media Relevance Models etc J. Jeon, V. Lavrenko, R. Manmatha “Automatic Image Annotation and Retrieval using Cross-Media J. Jeon, V. Lavrenko, R. Manmatha “Automatic Image Annotation and Retrieval using Cross-Media

Relevance Models” Relevance Models” Proceedings of the 26Proceedings of the 26 thth ACM SIGIR Conference on Research and Development ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, July, 2003. Pp 119-126in Information Retrieval (SIGIR 2003), Toronto, July, 2003. Pp 119-126

See also recent unpublished papers onSee also recent unpublished papers on

http://ciir.cs.umass.edu/~manmatha/mmpapers.html

More recent stuffMore recent stuff G Carneiro and N. Vasconcelos “A Database Centric View of Sentic Image Annotation and Retrieval” G Carneiro and N. Vasconcelos “A Database Centric View of Sentic Image Annotation and Retrieval”

Proceedings of the 28Proceedings of the 28 thth ACM SIGIR Conference on Research and Development in Information ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005

M. Srikanth, J. Varner, M. Bowden, D. Moldovan “Exploiting Ontologies for Automatic Image M. Srikanth, J. Varner, M. Bowden, D. Moldovan “Exploiting Ontologies for Automatic Image Annotation” Proceedings of the 28Annotation” Proceedings of the 28 thth ACM SIGIR Conference on Research and Development in ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005Information Retrieval (SIGIR 2005), Salvador, Brazil, August, 2005

See also the SIGIR workshop proceedingsSee also the SIGIR workshop proceedings

http://mmir.doc.ic.ac.uk/mmir2005http://mmir.doc.ic.ac.uk/mmir2005