cisc 7610 lecture 9 image retrieval - michael i...

CISC 7610 Lecture 9Image retrieval

Topics:How hard is computer vision?

Image retrieval tasksIndexing methods

Query by image: near-exact matchClassical image classification

Convolutional neural network classificationImage retrieval corpora

How hard is computer vision?

Zitnik, U. Washington, CSE P 576: Computer Vision, Lecture 1, https://courses.cs.washington.edu/courses/csep576/11sp/pdf/Intro.pdf

Marvin Minsky, MITTuring award,1969

“In 1966, Minsky hired a first-year undergraduate student and assigned him a problem to solve over the summer: connect a television camera to a computer and get the machine to describe what it sees.”Crevier 1993, pg. 88



Marvin Minsky, MITTuring award,1969


Gerald Sussman, MITPanasonic Professor of Electrical Engineering

“You’ll notice that Sussman never worked in vision again!” – Berthold Horn


Image retrieval tasks

● Query by description

● Query by image: near-exact matches

● Query by image: similar images

● Desired properties of an image retrieval system

Query by description:Google image search

https://www.google.com/search?site=&tbm=isch&source=hp&biw=1600&bih=791&q=cats&oq=cats&gs_l=img.3..0l10.8340.8780.0.9829.4.4.0.0.0.0.83.214.3.3.0....0...1ac.1.64.img..1.3.212.GXxCOQoqxuo

Query by image: similar imagesGoogle image search

Query by image: near-exact matchesAmazon A9 Flow

https://vimeo.com/85952072

Desired properties of an image retrieval system

● Invariance to – rotation, scaling, cropping

● Decoupling of – illumination, pose, background, occlusion,

intra-class variability, viewpoint

Lexing Xie, Columbia EE6882 Lecture 2http://www.ee.columbia.edu/~sfchang/course/svia/slides/lecture2.pdf

http://www.ee.columbia.edu/~sfchang/course/svia/slides/lecture2.pdf

Image indexing methods

● Text around images– Captions, articles, descriptions, metadata

● Folksonomy / human tags– Provided by people to organize their own photos

● Games with a purpose– Provide additional incentive for humans to label images

● Autotagging: automatically classify images– Hardest, but most scalable

Text around images

http://mashable.com/category/cats/

Folksonomy / human tags

https://www.flickr.com/search/?view_all=1&tags=grand%20central

Games with a purpose:ESP Game, Google Image Labeler

Autotagging: automatic classificationBehold image search

http://www.behold.cc/?textq=cats&query=

Query by image: near-exact matchSIFT features

● Compute salient points in image

● Characterize them with invariant features

● Index them with a text search engine

● Enforce geometric constraints after retrieval

Rueger, “Multimedia Information Retrieval” Lecture 2 www.nii.ac.jp/userimg/lectures/20120319/Lecture2.pdf

http://www.nii.ac.jp/userimg/lectures/20120319/Lecture2.pdf

SIFT: Scale-Invariant Feature Transform

● Image features that can be used to match different views of the same object

● Robust to substantial changes in illumination, scale, rotation, viewpoint, noise

● Lowe, D.G. (2004). “Distinctive Image Features from Scale-Invariant Keypoints.” International Journal of Computer Vision, 60, 2, pp. 91-110.

SIFT Algorithm

● Detect scale space extrema

● Localize candidate keypoints

● Assign an orientation to each keypoint

● Produce keypoint descriptor

Detect scale space extrema:Scale space

Scale

● Representation of image as it is shrunk

● Provides invariance to size of object / image

● Repeatedly smooth and shrink image




Detect scale space extrema:Example smoothed images


Detect scale space extrema:Compute differences between scales

Scaleoctave

Gaussian images Difference-of Gaussian images

-

-

-

-



Detect scale space extrema:Example difference images



Scale

Localize candidate keypoints

● Seek extrema in x and y, but also in scale

● So the scale just before a feature gets blurred out by the smoothing

● Find points greater than all of their neighbors



Assign an orientation to each keypointand produce descriptor

● Find “orientation” at each pixel

● Compute histogram of these orientations over pixels around the keypoint

● Align it to the dominant direction

● Provides robustness to rotation, pose, lightingRueger, “Multimedia Information Retrieval” Lecture 2 www.nii.ac.jp/userimg/lectures/20120319/Lecture2.pdf


SIFT Retrieval example

(Lowe, 2004)

Classical image tagging:Features

● Color features– Color histograms

– Color histograms in other color spaces

● Texture features– Tamura texture features

Grayscale histograms



3D Color histograms

● Count how many times each color appears

● Usually want to quantize colors first

● Ignores where in the image each color appears

Rueger, “Multimedia Information Retrieval” Figure 3.3. Morgan & Claypool: 2010.

3D Color histograms

http://rsb.info.nih.gov/ij/plugins/color-inspector.html

Color histogram example

● Draw a 3D color histogram for the following image

● Draw a color histogram for each channel

● Which one better characterizes the content?

R G B0 0 0 black

255 0 0 red0 255 0 green0 0 255 blue0 255 255 cyan

255 0 255 magenta255 255 0 yellow255 255 255 white



Color histograms in other color spaces: HSL, HSV

● Hue-Saturation-Lightness / Value

● Separates color into more meaningful axes

● Hue: color

● Saturation: intensity

● Lightness / Value: black / white balance

https://en.wikipedia.org/wiki/HSL_and_HSV

Tamura texture features

● Texture is a property of image regions, not pixels

● Perceptual experiments yielded a small set of descriptors that capture how people see texture

● Can attempt to replicate those computationally



Tamura texture features

● Compute texture features on image

● Create 3D histogram like color histogram

Rueger, “Multimedia Information Retrieval” Figure 3.5. Morgan & Claypool: 2010.

Coarseness Contrast Directionality

Classical image tagging:Classification

Shih-Fu Chang, Columbia EE6882 Lecture 1http://www.ee.columbia.edu/~sfchang/course/svia/slides/lecture1.pdf

http://www.ee.columbia.edu/~sfchang/course/svia/slides/lecture1.pdf

Modern image tagging:Convolutional neural networks

● Combination of filtering with pooling

● Filters are learned to optimize classification

● Online demos: – http://yann.lecun.com/exdb/lenet/

– http://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html

http://yann.lecun.com/exdb/lenet/

http://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html

Image retrieval corpora

● Pascal visual object classes (VOC)

● Imagenet

● MS common objects in context (COCO)

● Places2

Pascal visual object classes

● 20 Categories

● 50k images

● Localize and classify objects

● Ran 2007-2012

http://host.robots.ox.ac.uk/pascal/VOC/voc2007/

ImageNet – http://www.image-net.org

● 1000 categories

● 1.2 million images

● Images of nouns in WordNet

● Several related challenges

http://www.image-net.org/

MS Common Objects in Context (COCO) – http://mscoco.org/

● 91 objects types that would be easily recognizable by a 4 year old

● 330k images, 2.5 million labeled instances

● Objects in real context

http://mscoco.org/

Places2 – http://places2.csail.mit.edu/

● Recognize places / scenes, not objects

● Setting for where objects will appear

● 400 scene types, 10M images

http://places2.csail.mit.edu/

Summary

● Computer vision is hard

● Labels can come directly from humans or via autotagging models

● Fingerprinting supports near-exact matching

● Classical image classification uses hand-designed features with a learned classifier

● Convolutional neural networks learn both the features and the classifiers

● Several large image retrieval corpora have recently been released

cisc 7610 lecture 9 image retrieval - michael i...

Documents