kmi.open.ac - uni koblenz-landau · web-based image searching best current practice is a text...

140
Multimedia Information Retrieval Prof Stefan Rüger Multimedia and Information Systems Knowledge Media Institute The Open University http://kmi.open.ac.uk/mmis

Upload: others

Post on 30-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Multimedia Information Retrieval

Prof Stefan Rüger

Multimedia and Information SystemsKnowledge Media Institute

The Open University

http://kmi.open.ac.uk/mmis

Page 2: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

kmi.open.ac.uk

Page 3: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

kmi.open.ac.uk

Page 4: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

kmi.open.ac.uk

Since 1995: 117 projects & 67 technologies

Current year

17 live projects typically £2.5m ext, £1m internal

• 10 EU• 3 UK • 1 US• 3 internal (iTunes U, SocialLearn)

Page 5: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Multimedia information retrieval

1. What is multimedia information retrieval?

2. Metadata and piggyback retrieval

3. Multimedia fingerprinting

4. Automated annotation

5. Content-based retrieval

Page 6: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 7: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

The Twelve Collegia building on Vasilievsky Island in Saint Petersburgis the university's main building and the seat of the rector and administration (the building was constructed on the orders of Peter the Great)

Multimedia queries

Page 8: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 9: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Web-based image searching

Best current practice is a text search:Find text in filename, anchor text, caption, ...

Text search works by creating a large index:

Page 10: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

New search types

query doc

conventional text retrieval

hum a tune and get a music piece

you roar and get a wildlife documentarytype “floods” and get BBC radio news

Example

text

video

images

speech

music

sketches

multimedia

loca

tion

sound

hum

min

g

motion

text

imag

e

spee

ch

Page 11: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Exercise

Organise yourself in groupsDiscuss with neighbours - Two Examples for different query/doc modes? - How hard is this? Which techniques are involved? - One example combining different modes

Page 12: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Exercise

query doc

Discuss

- 2 examples

- How hard is it?

- 1 combination

loca

tion

sound

hum

min

g

motion

text

imag

e

spee

ch

loca

tion

sound

hum

min

g

motion

text

imag

e

spee

ch

text

video

images

speech

music

sketches

multimedia

Page 13: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Leaf detectionWhat are the challenges?

[with Natural History Museum, London, and Goldsmiths]

Page 14: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Venation pattern and shape

Shape is key

[with Frederic Fol Leymarie, Goldsmiths, 2011]

Page 15: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

The semantic gap

1m pixels with a spatial

colour distribution

faces & vase-like object

Page 16: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Polysemy

Page 17: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Multimedia information retrieval

1. What is multimedia information retrieval?

2. Metadata and piggyback retrieval

3. Multimedia fingerprinting

4. Automated annotation

5. Content-based retrieval

Page 18: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Metadata

Dublin Core

simple common denominator: 15 elements such as title, creator, subject, description, …

METS

Metadata Encoding and Transmission Standard

MARC 21

MAchine Readable Cataloguing (harmonised)

MPEG-7

Multimedia specific metadata standard

Page 19: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

MPEG-7

● Moving Picture Experts Group “Multimedia Content Description Interface”

● Not an encoding method like MPEG-1, MPEG-2 or MPEG-4!

● Usually represented in XML format● Full MPEG-7 description is complex and

comprehensive● Detailed Audiovisual Profile (DAVP)

[P Schallauer, W Bailer, G Thallinger, “A description infrastructure for audiovisual media processing systems based on MPEG-7”, Journal of Universal Knowledge Management, 2006]

Page 20: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

MPEG-7 example

<Mpeg7 xsi:schemaLocation="urn:mpeg:mpeg7:schema:2004 ./davp-2005.xsd" ... > <Description xsi:type="ContentEntityType"> <MultimediaContent xsi:type="AudioVisualType"> <AudioVisual> <StructuralUnit href="urn:x-mpeg-7-pharos:cs:AudioVisualSegmentationCS:root"/> <MediaSourceDecomposition criteria="kmi image annotation segment"> <StillRegion> <MediaLocator><MediaUri>http://...392099.jpg</MediaUri></MediaLocator> <StructuralUnit href="urn:x-mpeg-7-pharos:cs:SegmentationCS:image"/> <TextAnnotation type="urn:x-mpeg-7-pharos:cs:TextAnnotationCS: image:keyword:kmi:annotation_1" confidence="0.87"> <FreeTextAnnotation>tree</FreeTextAnnotation> </TextAnnotation> <TextAnnotation type="urn:x-mpeg-7-pharos:cs:TextAnnotationCS: image:keyword:kmi:annotation_2" confidence="0.72"> <FreeTextAnnotation>field</FreeTextAnnotation> </TextAnnotation> </StillRegion> </MediaSourceDecomposition> </AudioVisual> </MultimediaContent> </Description> </Mpeg7>

Page 21: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

MPEG-7 example

<Mpeg7 xsi:schemaLocation="urn:mpeg:mpeg7:schema:2004 ./davp-2005.xsd" ... > <Description xsi:type="ContentEntityType"> <MultimediaContent xsi:type="AudioVisualType"> <AudioVisual> <StructuralUnit href="urn:x-mpeg-7-pharos:cs:AudioVisualSegmentationCS:root"/> <MediaSourceDecomposition criteria="kmi image annotation segment"> <StillRegion> <MediaLocator><MediaUri>http://...392099.jpg</MediaUri></MediaLocator> <StructuralUnit href="urn:x-mpeg-7-pharos:cs:SegmentationCS:image"/> <TextAnnotation type="urn:x-mpeg-7-pharos:cs:TextAnnotationCS: image:keyword:kmi:annotation_1" confidence="0.87"> <FreeTextAnnotation>tree</FreeTextAnnotation> </TextAnnotation> <TextAnnotation type="urn:x-mpeg-7-pharos:cs:TextAnnotationCS: image:keyword:kmi:annotation_2" confidence="0.72"> <FreeTextAnnotation>field</FreeTextAnnotation> </TextAnnotation> </StillRegion> </MediaSourceDecomposition> </AudioVisual> </MultimediaContent> </Description> </Mpeg7>

Page 22: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Digital libraries

Manage document repositories and their metadata

Greenstone digital library suitehttp://www.greenstone.org/interface in 50+ languages (documented in 5)knows metadataunderstands multimedia

XML or text retrieval

Page 23: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Piggy-back retrieval

query doc

loca

tion

sound

hum

min

g

motion

text

imag

e

spee

ch

text

video

images

speech

music

sketches

multimedia

text

Page 24: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Music to text

0 +7 0 +2 0 -2 0 -2 0 -1 0 -2 0 +2 -4

ZBZb

ZGZB

GZBZ

Z G Z B Z b Z b Z a Z b Z B d

[with Doraisamy, J of Intellig Inf Systems 21(1), 2003; Doraisamy PhD thesis 2004]

Page 25: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 26: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 27: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 28: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 29: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Multimedia information retrieval

1. What is multimedia information retrieval?

2. Metadata and piggyback retrieval

3. Multimedia fingerprinting

4. Automated annotation

5. Content-based retrieval

Page 30: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 31: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 32: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Snaptell: Book, CD and DVD covers

Page 33: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Snaptell: Book, CD and DVD covers

Page 34: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Snaptell: Book, CD and DVD covers

Page 35: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Snaptell: Book, CD and DVD covers

Page 36: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 37: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 38: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Spot & Search

[with Suzanne Little]

Page 39: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Near duplicate detection

Works well in 2d: CD covers, wine labels, signs, ...Less so in near 2d: buildings, vases, …Not so well in 3d: faces, complex objects, ...

Page 40: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Shazam

Rueger, Multimedia IR, 2010explains it all! Buy it now

Page 41: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Near duplicate detectionExercise

Find applications for near-duplicate detection - be imaginative: the more “outragous” the better - can be other media types (audio, smells, haptic, ...) - can be hard to do

Page 42: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 43: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

How does near-duplicate detection work?

Fingerprinting technique

1 Compute salient points

2 Extract “characteristics” from vincinity (feature)

3 Make invariant under rotation & scaling

4 Quantise: create visterms

5 Index as in text search engines

6 Check/enforce spatial constraints after retrieval

Page 44: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

NDD: Compute salient points and features

[Lowe2004 – http://www.cs.ubc.ca/~lowe/keypoints/]

Eg, SIFT features: each salient pointdescribed by a feature vector of 128numbers; the vector is invariant toscaling and rotation

Page 45: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

NDD: Keypoint feature space clustering

xxx

x

xxx

x

x

x

x

x

xx

xx

xx

x

x

xxx

x

xxx

x

x

x xx

xx

xx

xxx

x

x

xx

x

x

x

xx

Feature space

NineGeeseAreRunningUnderAWharfAndHereIAm

All keypoint features of all images in collection

Millions of “visterms”

x x

Page 46: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

ClusteringHierarchical k-means

[Nister and Stewenius, CVPR 2006]

Page 47: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

NDD: Encode all images with visterms

Jkjh Geese Bjlkj WharfOjkkjhhj Kssn Klkekjl HereLkjkll Wjjkll Kkjlk Bnm Kllkgjg Lwoe Boerm ...

Page 48: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

NDD: query like text

[with Suzanne Little]

At query time compute salient points,keypoint features and visterms

Query against database of images represented as bag of vistems

Joiu Gddwd Bipoi WueftOiooiuui Kwwn Kpodoip HdfdLoiopp Wiiopp Koipo BnmKppoyiy Lsld Bldfm ...

Query

Page 49: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

NDD: Check spatial constraints

[with Suzanne Little, SocialLearn project]

Page 50: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

How does near-duplicate detection work?

Fingerprinting technique

1 Compute salient points

2 Extract “characteristics” from vincinity (feature)

3 Make invariant under rotation & scaling

4 Quantise: create visterms

5 Index as in text search engines

6 Check/enforce spatial constraints after retrieval

Page 51: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

How Shazam works - Spectrogram

Compute energy for all (frequency,time) pairs using a Fourier transform under a Hann window w

Page 52: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Hann window application

Page 53: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

How Shazam works:audio fingerprinting

Page 54: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

How Shazam works:audio fingerprinting

Page 55: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Salient points

Encoding: (f1, f2, t2-t1) hashes to (t1, id)[Wang(2003), An industrial-strength search algotithm, ISMIR]

Page 56: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Temporal consistency check of query

Every query vector (f1,f

2, tq

2-tq

1) is matched to the database.

You get a list of possible (tid1, id) values (some are false positives).

Create a histogram of tid1-tq

1 values (temporal consistency check!)

A substantial peak in this histogram means that the query hasmatched song id at time offset tid

1-tq

1.

Page 57: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Entropy considerations

Specificity: Encoding (f1, f2, t2-t1) to use 30 bit

Page 58: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

ExerciseShazam's constellation pairs

sdd

Assume that the typical survival probability of each 30-bit constellation pair after deformations that we still want to recogniseis p, and that this process is independent per pair. Which encodingdensity, ie, the number of constellation pairs per second, would you need on average so that a typical query of 10 seconds exhibits at least 10 matches in the right song with a probability of at least 99.99%?

Under these assumptions, further assuming that the constellation pair extraction looks like a random independent and identically distributed number, what is the false positive rate for a database of 4 million songs each of which is 5 minutes long on average?

Page 59: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

ExerciseShazam's constellation pairs

sdd

Which encoding density would you need on average so that a typical query of 10 seconds exhibits at least 10 matches in the right song with a probability of at least 99.99%?

- approximately 1 match per second needed (n = pairs/second):

Page 60: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

ExerciseShazam's constellation pairs

sdd

Which encoding density would you need on average so that a typical query of 10 seconds exhibits at least 10 matches in the right song with a probability of at least 99.99%?

- Exact solution: binomial distribution

Page 61: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

ExerciseShazam's constellation pairs

sdd

Which encoding density would you need on average so that a typical query of 10 seconds exhibits at least 10 matches in the right song with a probability of at least 99.99%?

- Large n: approximate binomial distribution with N(np, sqrt(np(1-p)))

Page 62: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

ExerciseShazam's constellation pairs

sdd

Assuming that the constellation pair extraction looks like a random independent and identically distributed number, what is the false positive rate for a database of 4 million songs each of which is 5 minutes long on average?

Zero:

5min = 30*10sec(assume distinctive 2^30)m = 2^-30

p(query matches one segment) approx m^10 approx 2^-3001-(1-p(qms))^(30*4e6) approx 120e6*m^10still near zero

Page 63: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Philips Research

Divide frequency scale into 33 frequency bands between 300 Hz and 2000 HzLogarithmic spread – each frequency step is 1/12 octave, ie, one semitone

Divide time axis into blocks of 256 windows of 11.6 ms (3 seconds)

E(m,n) is the energy of the m-th frequency at n-th time in spectrogram

For each block extract 256 sub-fingerprints of 32 bits each

[Haitsma and Kalker, 2003]

Page 64: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Partial fingerprint block

Page 65: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Probability of at least one sub-fingerprint surviving with no more than 4 errors

Page 66: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Quantisation through locality sensitive hashing (LSH)

Page 67: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 68: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Redundancy is key

Use L independent hash vectors of k components eachboth for the query and for each multimedia object.

Database elements that match at least m out of L timesare candidates for nearest neighbours.

Chose w, k and L (wisely) at runtime

- w determines granularity of bins, ie, # of bits for hi(v)

- k and L determine probability of matching

Page 69: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Prob(min 1 match out of L)

L fixed, k variable

Page 70: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Prob(min 1 match out of L)

k fixed, L variable

Page 71: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Exercise:compute inflection point

x

Page 72: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Min hashEstimate discrete set overlap

Page 73: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

An example4 documents

D1 = Humpty Dumpty sat on a wall,

D2 = Humpty Dumpty had a great fall.

D3 = All the King's horses, And all the King's men

D4 = Couldn't put Humpty together again!

Page 74: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Surrogate docs after stop word removal and stemming

A1 = {humpty, dumpty, sat, wall}

A2 = {humpty, dumpty, great, fall}

A3 = {all, king, horse, men}

A4 = {put, humpty, together, again}

Page 75: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Equivalent term-document matrix

Page 76: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 77: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Estimation of similaritythrough random permutations

Page 78: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Surrogate documents form random permutations

Keep first occurring word of Ai in π

j for

dense surrogate representation

Page 79: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 80: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

SIFT

Scale Invariant Feature Transform

“distinctive invariant image features that can be used to perform reliable matching between different views of an object or scene.”

Invariant to image scale and rotation.

Robust to substantial range of affine distortion, changes in 3D viewpoint, addition of noise and change in illumination.

[Lowe, D.G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60, 2, pp. 91-110.]

Page 81: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

SIFT Implementation

For a given image:

Detect scale space extrema

Localise candidate keypoints

Assign an orientation to each keypoint

Produce keypoint descriptor

Page 82: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

A scale space visualisation

Scale

Page 83: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Difference of Gaussian image creation

Scale octave

Gaussian images Difference-of Gaussian images

-

-

-

-

Page 84: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Gaussian blur illustration

Page 85: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Difference of Gaussian illustration

Page 86: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

The SIFT keypoint system

Once the Difference of Gaussian images have been generated:

●Each pixel in the images is compared to 8 neighbours at same scale.

●Also compared to 9 corresponding neighbours in scale above and 9 corresponding neighbours in the scale below.

●Each pixel is compared to 26 neighbouring pixels in 3x3 regions across scales, as it is not compared to itself at the current scale.

●A pixel is selected as a SIFT keypoint only either if its intensity value is extreme.

Page 87: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Pixel neighbourhood comparison

Scale

Page 88: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Orientation assignment

Orientation histogram with 36 bins – one per10 degrees.

Each sample weighted by gradient magnitude and Gaussian window.

Canonical orientation at peak ofSmoothed histogram.

Where two or more orientations are detected,keypoints created for each orientation.

Page 89: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

The SIFT keypoint descriptor

We now have location, scale and orientation for each SIFT keypoint (“keypoint frame”).→ descriptor for local image region is required.Must be as invariant as possible to changes in

illumination and 3D viewpoint.Set of orientation histograms are computed on

4x4 pixel areas.Each gradient histogram contains 8 bins and each

descriptor contains an array of 4 histograms.→ SIFT descriptor as 128 (4 x 4 x 8) element

histogram

Page 90: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Visualising the keypoint descriptor

Page 91: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Example SIFT keypoints

Page 92: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Multimedia information retrieval

1. What is multimedia information retrieval?

2. Metadata and piggyback retrieval

3. Multimedia fingerprinting

4. Automated annotation

5. Content-based retrieval

Page 93: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Automated annotation as machine translation

water grass trees

the beautiful sun

le soleil beau

Page 94: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Automated annotation as machine learning

Probabilistic models:maximum entropy modelsmodels for joint and conditional probabilitiesevidence combination with Support Vector Machines

[with Magalhães, SIGIR 2005][with Yavlinsky and Schofield, CIVR 2005][with Yavlinsky, Heesch and Pickering: ICASSP May 2004][with Yavlinsky et al CIVR 2005][with Yavlinsky SPIE 2007][with Magalhães CIVR 2007, best paper]

Page 95: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 96: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Automated annotation

Automated: water buildings city sunset aerial

[Corel Gallery 380,000]

[with Yavlinsky et al CIVR 2005][with Yavlinsky SPIE 2007][with Magalhaes CIVR 2007, bestpaper]

Page 97: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

The good

door

[beholdsearch.com, 19.07.2007, now behold.cc (Yavlinksy)][images: Flickr creative commons]

Page 98: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

The bad

wave

[beholdsearch.com, 19.07.2007, now behold.cc (Yavlinksy)][images: Flickr creative commons]

Page 99: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

The ugly

iceberg

[beholdsearch.com, 19.07.2007, now behold.cc (Yavlinksy)][images: Flickr creative commons]

Page 100: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Multimedia information retrieval

1. What is multimedia information retrieval?

2. Metadata and piggyback retrieval

3. Multimedia fingerprinting

4. Automated annotation

5. Content-based retrieval

Page 101: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Why content-based?

Give examples where we remember details by - metadata? - context? - content (eg, “x” belongs to “y”)?

Metadata versus content-based: pro and con - - - -

Page 102: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Content-based retrieval:features and distances

x

xx

x

o

Feature space

Page 103: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Content-based retrieval:Architecture

Page 104: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Features

Visual Colour, texture, shape, edge detection, SIFT/SURFAudioTemporal

How to describe the features? For people For computers

Page 105: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Digital Images

Page 106: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Content of an image

145 173 201 253 245 245153 151 213 251 247 247181 159 225 255 255 255165 149 173 141 93 97167 185 157 79 109 97121 187 161 97 117 115

Page 107: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Histogram

1: 0 - 312: 32 - 633: 64 - 954: 96 – 1275: 128 – 1596: 160 – 1917: 192 - 2238: 224 – 255

1 2 3 4 5 6 7 80

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Page 108: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 109: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 110: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Exercise

Sketch a 3D colour histogram for

R G B 0 0 0 black255 0 0 red 0 255 0 green 0 0 255 blue 0 255 255 cyan255 0 255 magenta255 255 0 yellow255 255 255 white

Page 111: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

http://blog.xkcd.com/2010/05/03/color-survey-results/

Page 112: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 113: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

HSB colour model

Page 114: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

HSB model

disadvantage: hue coordinate is not continuous0 and 360 degrees have the same meaningbut there is a huge difference in terms of numeric

distanceexample:

red = (0°,100%,50%) = (360°,100%,50%)

advantage: it is more natural to describe colour changes “brighter blue”, “purer magenta”, etc

Page 115: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Texture

coarseness contrast directionality

Page 116: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 117: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 118: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Shape Analysis

shape = class of geometric objects invariant undertranslationscale (changes keeping the aspect ratio)rotations

information preserving description(for compression)

non-information preserving (for retrieval)boundary based (ignore interior)region based (boundary+interior)

Page 119: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 120: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 121: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Localisation

0

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8

64% centre

36% border

Page 122: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Tiled Histograms

00.10.20.30.40.50.60.70.80.9

1

1 2 3 4 5 6 7 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 2 3 4 5 6 7 8

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

1 2 3 4 5 6 7 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 80

0.050.1

0.150.2

0.250.3

0.350.4

0.450.5

1 2 3 4 5 6 7 8

Page 123: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 124: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 125: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

gradual transition detection (eg, fade) accumulate distances long-range comparison

audio cues silence and/or speaker change

motion detection and analysis camera motion, zoom, object motion MPEG provides some motion vectors

Video Segmentation

Page 126: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 127: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 128: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

At time t define distance dn(t)

- compare frames t-n+i and t+i (i=0,...,n-1)

- average their respective distances over i

Peak in dn(t) detected if

dn(t)>threshold and

dn(t)>dn(s) for all neighbouring s

Shot = near-coincident peaks of d16 and d8

t

time

n

Long range comparison

Page 129: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 130: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 131: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 132: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating
Page 133: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Features and distances

x

xx

x

o

Feature space

Page 134: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Distances and similarities

assumes coding of MM objects as data vectorsdistance measures

Euclidean, Manhattan

correlation measuresCosine similarity measure

histogram intersection for normalised histograms

Page 135: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

L2 vs L1

Page 136: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

p<1?M

ean a

vera

ge

pre

cisi

on

What happens at p<1?

p [with Howarth, ECIR 2005]

Page 137: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Other distance measures

• Squared chord• Earth Mover's Distance• Chi squared distance• Kullback-Leibler divergence (not a true distance)• Ordinal distances (for string values)

Page 138: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Best distance?

Squared chord

[with Liu et al, AIRS 2008; with Hu et al, ICME 2008]

Page 139: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Recap: Multimedia information retrieval

1. What is multimedia information retrieval?

2. Metadata and piggyback retrieval

3. Multimedia fingerprinting

4. Automated annotation

5. Content-based retrieval

Page 140: kmi.open.ac - Uni Koblenz-Landau · Web-based image searching Best current practice is a text search: Find text in filename, anchor text, caption, ... Text search works by creating

Multimedia Information Retrieval

Prof Stefan Rüger

Multimedia and Information SystemsKnowledge Media Institute

The Open University

http://kmi.open.ac.uk/mmis