multimedia indexing and dimensionality reduction

Multimedia Indexing and Dimensionality Reduction

Multimedia Data Management

• The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video tracks) has increased in the recent years.• Joint Research from Database Management, Computer Vision, Signal Processing and Pattern Recognition aims to solve problems related to multimedia data management.

Multimedia Data

• There are four major types of multimedia data: images, video sequences, sound tracks, and text.• From the above, the easiest type to manage is text, since we can order, index, and search text using string management techniques, etc.• Management of simple sounds is also possible by representing audio as signal sequences over different channels.• Image retrieval has received a lot of attention in the last decade (CV and DBs). The main techniques can be extended and applied also for video retrieval.

Content-based Image Retrieval

• Images were traditionally managed by first annotating their contents and then using text-retrieval techniques to index them.• However, with the increase of information in digital image format some drawbacks of this technique were revealed:

• Manual annotation requires vast amount of labor• Different people may perceive differently the contents of an image; thus no objective keywords for search are defined

• A new research field was born in the 90’s: Content-based Image Retrieval aims at indexing and retrieving images based on their visual contents.

Feature Extraction

• The basis of Content-based Image Retrieval is to extract and index some visual features of the images.• There are general features (e.g., color, texture, shape, etc.) and domain-specific features (e.g., objects contained in the image).

• Domain-specific feature extraction can vary with the application domain and is based on pattern recognition• On the other hand, general features can be used independently from the image domain.

Color Features

• To represent the color of an image compactly, a color histogram is used. Colors are partitioned to k groups according to their similarity and the percentage of each group in the image is measured. • Images are transformed to k-dimensional points and a distance metric (e.g., Euclidean distance) is used to measure the similarity between them.

k-bins

k-dimensional space

Using Transformations to Reduce Dimensionality

• In many cases the embedded dimensionality of a search problem is much lower than the actual dimensionality• Some methods apply transformations on the data and approximate them with low-dimensional vectors• The aim is to reduce dimensionality and at the same time maintain the data characteristics• If d(a,b) is the distance between two objects a, b in real (high-dimensional) and d’(a’,b’) is their distance in the transformed low-dimensional space, we want d’(a’,b’)d(a,b).

d(a,b)

d’(a’,b’)

Problem - Motivation

Given a database of documents, find documents containing “data”, “retrieval”

Applications: Web law + patent offices digital libraries information filtering

Types of queries: boolean (‘data’ AND ‘retrieval’ AND NOT ...) additional features (‘data’ ADJACENT

‘retrieval’) keyword queries (‘data’, ‘retrieval’)

How to search a large collection of documents?

Problem - Motivation

Text – Inverted Files

Q: space overhead?


A: mainly, the postings lists

how to organize dictionary?

stemming – Y/N? Keep only the root of each word ex.

inverted, inversion invert insertions?


how to organize dictionary? B-tree, hashing, TRIEs, PATRICIA trees, ...

stemming – Y/N? insertions?


postings list – more Zipf distr.: eg., rank-frequency plot of ‘Bible’

log(rank)

log(freq) freq ~ 1/rank /

ln(1.78V)


postings lists Cutting+Pedersen

(keep first 4 in B-tree leaves) how to allocate space: [Faloutsos+92]

geometric progression compression (Elias codes) [Zobel+] – down

to 2% overhead! Conclusions: needs space overhead (2%-300%), but

it is the fastest


Text - Detailed outline

Text databases problem full text scanning inversion signature files (a.k.a. Bloom Filters) Vector model and clustering information filtering and LSI

Vector Space Model and Clustering

Keyword (free-text) queries (vs Boolean) each document: -> vector (HOW?) each query: -> vector search for ‘similar’ vectors

Vector Space Model and Clustering

main idea: each document is a vector of size d: d is the number of different terms in the database

document

...data...

aaron zoodata

d (= vocabulary size)

‘indexing’

Document Vectors

Documents are represented as “bags of words”

OR as vectors. A vector is like an array of floating points Has direction and magnitude Each vector holds a place for every term

in the collection Therefore, most vectors are sparse

Document VectorsOne location for each word.

nova galaxy heat h’wood film rolediet fur

10 5 3 5 10

10 8 7 9 10 5

10 10 9 10

5 7 9 6 10 2 8

7 5 1 3

ABCDEFGHI

“Nova” occurs 10 times in text A“Galaxy” occurs 5 times in text A“Heat” occurs 3 times in text A(Blank means 0 occurrences.)

Document VectorsOne location for each word.


10 5 3 5 10

10 8 7 9 10 5

10 10 9 10

5 7 9 6 10 2 8

7 5 1 3

ABCDEFGHI

“Hollywood” occurs 7 times in text I“Film” occurs 5 times in text I“Diet” occurs 1 time in text I“Fur” occurs 3 times in text I

Document Vectors


10 5 3 5 10

10 8 7 9 10 5

10 10 9 10

5 7 9 6 10 2 8

7 5 1 3

ABCDEFGHI

Document ids

We Can Plot the Vectors

Star

Diet

Doc about astronomyDoc about movie stars

Doc about mammal behavior

Assigning Weights to Terms

Binary Weights Raw term frequency tf x idf

Recall the Zipf distribution Want to weight terms highly if they are

frequent in relevant documents … BUT infrequent in the collection as a whole

Binary Weights

Only the presence (1) or absence (0) of a term is included in the vector

docs t1 t2 t3D1 1 0 1D2 1 0 0D3 0 1 1D4 1 0 0D5 1 1 1D6 1 1 0D7 0 1 0D8 0 1 0D9 0 0 1

D10 0 1 1D11 1 0 1

Raw Term Weights

The frequency of occurrence for the term in each document is included in the vector

docs t1 t2 t3D1 2 0 3D2 1 0 0D3 0 4 7D4 3 0 0D5 1 6 3D6 3 5 0D7 0 8 0D8 0 10 0D9 0 0 1

D10 0 3 5D11 4 0 1

Assigning Weights

tf x idf measure: term frequency (tf) inverse document frequency (idf) -- a way to

deal with the problems of the Zipf distribution

Goal: assign a tf * idf weight to each term in each document

tf x idf

)/log(* kikik nNtfw

log

Tcontain that in documents ofnumber the

collection in the documents ofnumber total

in T termoffrequency document inverse

document in T termoffrequency

term

nNidf

Cn

CN

Cidf

Dtf

kT

kk

kk

kk

ikik

k

Inverse Document Frequency

IDF provides high values for rare words and low values for common words

41

10000log

698.220

10000log

301.05000

10000log

010000

10000log

For a collectionof 10000 documents

Similarity Measures for document vectors

|)||,min(|

||

||||

||

||||

||||

||2

||

21

21

DQ

DQ

DQ

DQ

DQDQ

DQ

DQ

DQ

Simple matching (coordination level match)

Dice’s Coefficient

Jaccard’s Coefficient

Cosine Coefficient

Overlap Coefficient

tf x idf normalization

Normalize the term weights (so longer documents are not unfairly given more weight)

normalize usually means force all values to fall within a certain range, usually between 0 and 1, inclusive.

t

k kik

kikik

nNtf

nNtfw

1

22 )]/[log()(

)/log(

Vector space similarity(use the weights to compare the documents)

product.inner normalizedor cosine, thecalled also is This

),(

:is documents twoof similarity theNow,

1

t

kjkikji wwDDsim

Computing Similarity Scores

2

1 1D

Q2D

98.0cos

74.0cos

)8.0 ,4.0(

)7.0 ,2.0(

)3.0 ,8.0(

2

1

2

1

Q

D

D

1.0

0.8

0.6

0.8

0.4

0.60.4 1.00.2

0.2

Vector Space with Term Weights and Cosine Matching

1.0

0.8

0.6

0.4

0.2

0.80.60.40.20 1.0

D2

D1

Q

1

2

Term B

Term A

Di=(di1,wdi1;di2, wdi2;…;dit, wdit)Q =(qi1,wqi1;qi2, wqi2;…;qit, wqit)

t

j

t

j dq

t

j dq

i

ijj

ijj

ww

wwDQsim

1 1

22

1

)()(),(

Q = (0.4,0.8)D1=(0.8,0.3)D2=(0.2,0.7)

98.042.0

64.0

])7.0()2.0[(])8.0()4.0[(

)7.08.0()2.04.0()2,(

2222

DQsim

74.058.0

56.),( 1 DQsim

Text - Detailed outline

Text databases problem full text scanning inversion signature files (a.k.a. Bloom Filters) Vector model and clustering information filtering and LSI

Information Filtering + LSI [Foltz+,’92] Goal:

users specify interests (= keywords) system alerts them, on suitable news-

documents Major contribution: LSI = Latent

Semantic Indexing latent (‘hidden’) concepts

Information Filtering + LSI

Main idea map each document into some

‘concepts’ map each term into some ‘concepts’

‘Concept’:~ a set of terms, with weights, e.g. “data” (0.8), “system” (0.5), “retrieval”

(0.6) -> DBMS_concept


Pictorially: term-document matrix (BEFORE)

'data' 'system' 'retrieval' 'lung' 'ear'

TR1 1 1 1

TR2 1 1 1

TR3 1 1

TR4 1 1

Information Filtering + LSIPictorially: concept-document matrix

and...'DBMS-concept'

'medical-concept'

TR1 1

TR2 1

TR3 1

TR4 1

Information Filtering + LSI... and concept-term matrix

'DBMS-concept'

'medical-concept'

data 1

system 1

retrieval 1

lung 1

ear 1


Q: How to search, eg., for ‘system’?

Information Filtering + LSIA: find the corresponding concept(s);

and the corresponding documents

'DBMS-concept'

'medical-concept'

data 1

system 1

retrieval 1

lung 1

ear 1

'DBMS-concept'

'medical-concept'

TR1 1

TR2 1

TR3 1

TR4 1


Thus it works like an (automatically constructed) thesaurus:

we may retrieve documents that DON’T have the term ‘system’, but they contain almost everything else (‘data’, ‘retrieval’)

SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

SVD - Motivation problem #1: text - LSI: find

‘concepts’ problem #2: compression / dim.

reduction

SVD - Motivation problem #1: text - LSI: find

‘concepts’

SVD - Motivation problem #2: compress / reduce

dimensionality

Problem - specs ~10**6 rows; ~10**3 columns; no updates; random access to any cell(s) ; small error: OK

SVD - Motivation

SVD - Definition

A[n x m] = U[n x r] r x r] (V[m x r])T

A: n x m matrix (eg., n documents, m terms)

U: n x r matrix (n documents, r concepts)

: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix)

V: m x r matrix (m terms, r concepts)

SVD - Properties

THEOREM [Press+92]: always possible to decompose matrix A into A = U VT , where

U, V: unique (*) U, V: column orthonormal (ie., columns

are unit vectors, orthogonal to each other) UT U = I; VT V = I (I: identity matrix)

: eigenvalues are positive, and sorted in decreasing order

SVD - Example A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x


1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

CS-conceptMD-concept


1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

CS-conceptMD-concept

doc-to-concept similarity matrix


1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

‘strength’ of CS-concept


1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

term-to-conceptsimilarity matrix

CS-concept

SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

SVD - Interpretation #1

‘documents’, ‘terms’ and ‘concepts’: U: document-to-concept similarity

matrix V: term-to-concept sim. matrix : its diagonal elements:

‘strength’ of each concept

SVD - Interpretation #2 best axis to project on: (‘best’ =

min sum of squares of projection errors)

SVD - Motivation

SVD - interpretation #2

minimum RMS error

SVD: givesbest axis to project

v1


A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

v1

SVD - Interpretation #2 A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

variance (‘spread’) on the v1 axis

SVD - Interpretation #2 A = U VT - example:

U gives the coordinates of the points in the projection axis

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

SVD - Interpretation #2 More details Q: how exactly is dim. reduction

done?1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

SVD - Interpretation #2 More details Q: how exactly is dim. reduction

done? A: set the smallest eigenvalues to

zero:1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x


1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

~9.64 0

0 0x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x


1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18

0.36

0.18

0.90

0

00

~9.64

x

0.58 0.58 0.58 0 0

x


1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

~

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 0 0

0 0 0 0 00 0 0 0 0


Equivalent:‘spectral decomposition’ of the

matrix:1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x


Equivalent:‘spectral decomposition’ of the

matrix:1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= x xu1 u2

1

2

v1

v2

SVD - Interpretation #2Equivalent:‘spectral decomposition’ of the

matrix:1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u11 vT1 u22 vT

2+ +...n

m

r

i

Tiii vu

1


‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u11 vT1 u22 vT

2+ +...n

m

n x 1 1 x m

r terms


approximation / dim. reduction:by keeping the first few terms (Q: how

many?)1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u11 vT1 u22 vT

2+ +...n

m

assume: 1 >= 2 >= ...

To do the mapping you use VT

X’ = VT X


A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of i ’s)

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u11 vT1 u22 vT

2+ +...n

m

assume: 1 >= 2 >= ...


finds non-zero ‘blobs’ in a data matrix

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

SVD - Interpretation #3 finds non-zero ‘blobs’ in a data

matrix

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

SVD - Interpretation #3 Drill: find the SVD, ‘by inspection’! Q: rank = ??

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x?? ??

??

SVD - Interpretation #3 A: rank = 2 (2 linearly independent

rows/cols)

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x??

??

?? 0

0 ??

??

??

SVD - Interpretation #3 A: rank = 2 (2 linearly independent

rows/cols)

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x?? 0

0 ??

1 0

1 0

1 0

0 1

0 11 1 1 0 0

0 0 0 1 1

orthogonal??

SVD - Interpretation #3 column vectors: are orthogonal -

but not unit vectors:

3

1

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x?? 0

0 ??

3

1

3

1

3

1

3

1

3

1

00

000

2

1

2

1

0 0

0 0 0 2

1

2

1

SVD - Interpretation #3 and the eigenvalues are:

1 1 1 0 0

1 1 1 0 0

1 1 1 0 0

0 0 0 1 1

0 0 0 1 1

= x x3 0

0 2

3

1

3

1

3

1

00

000

2

1

2

1

3

1

3

1

3

10 0

0 0 0 2

1

2

1

SVD - Interpretation #3 A: SVD properties:

matrix product should give back matrix A

matrix U should be column-orthonormal, i.e., columns should be unit vectors, orthogonal to each other

ditto for matrix V matrix should be diagonal, with

positive values

SVD - Complexity O( n * m * m) or O( n * n * m)

(whichever is less) less work, if we just want

eigenvalues or if we want first k eigenvectors or if the matrix is sparse [Berry] Implemented: in any linear algebra

package (LINPACK, matlab, Splus, mathematica ...)

Optimality of SVD

Def: The Frobenius norm of a n x m matrix M is

(reminder) The rank of a matrix M is the number of independent rows (or columns) of M

Let A=UVT and Ak = Uk k VkT (SVD approximation of

A) Ak is an nxm matrix, Uk an nxk, k kxk, and Vk mxk

Theorem: [Eckart and Young] Among all n x m matrices C of rank at most k, we have that:

2],[ jiMMF

FFk CAAA

Kleinberg’s Algorithm Main idea: In many cases, when you

search the web using some terms, the most relevant pages may not contain this term (or contain the term only a few times) Harvard : www.harvard.edu Search Engines: yahoo, google, altavista

Authorities and hubs

Kleinberg’s algorithm Problem dfn: given the web and a query find the most ‘authoritative’ web pages

for this query

Step 0: find all pages containing the query terms (root set)

Step 1: expand by one move forward and backward (base set)

Kleinberg’s algorithm Step 1: expand by one move

forward and backward

Kleinberg’s algorithm on the resulting graph, give high

score (= ‘authorities’) to nodes that many important nodes point to

give high importance score (‘hubs’) to nodes that point to good ‘authorities’)

hubs authorities

Kleinberg’s algorithm

observations recursive definition! each node (say, ‘i’-th node) has

both an authoritativeness score ai and a hubness score hi


Let E be the set of edges and A be the adjacency matrix: the (i,j) is 1 if the edge from i to j exists

Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores.

Then:


Then:ai = hk + hl + hm

that isai = Sum (hj) over all j

that (j,i) edge existsora = AT h

k

l

m

i


symmetrically, for the ‘hubness’:

hi = an + ap + aq

that ishi = Sum (qj) over all j

that (i,j) edge existsorh = A a

p

n

q

i


In conclusion, we want vectors h and a such that:

h = A aa = AT h

Recall properties:C(2): A [n x m] v1 [m x 1] = 1 u1 [n x 1]

C(3): u1T A = 1 v1

T

Kleinberg’s algorithmIn short, the solutions to

h = A aa = AT h

are the left- and right- eigenvectors of the adjacency matrix A.

Starting from random a’ and iterating, we’ll eventually converge

(Q: to which of all the eigenvectors? why?)


(Q: to which of all the eigenvectors? why?)

A: to the ones of the strongest eigenvalue, because of property B(5):B(5): (AT

A ) k v’ ~ (constant) v1

Kleinberg’s algorithm - results

Eg., for the query ‘java’:0.328 www.gamelan.com0.251 java.sun.com0.190 www.digitalfocus.com (“the

java developer”)

Kleinberg’s algorithm - discussion

‘authority’ score can be used to find ‘similar pages’ to page p

closely related to ‘citation analysis’, social networs / ‘small world’ phenomena

google/page-rank algorithm

closely related: The Web is a directed graph of connected nodes

imagine a particle randomly moving along the edges (*)

compute its steady-state probabilities. That gives the PageRank of each pages (the importance of this page)

(*) with occasional random jumps

PageRank Definition Assume a page A and pages T1, T2,

…, Tm that point to A. Let d is a damping factor. PR(A) the pagerank of A. C(A) the out-degree of A. Then:

))(

)(...

)2(

)2(

)1(

)1(()1()(

TmC

TmPR

TC

TPR

TC

TPRddAPR

google/page-rank algorithm Compute the PR of each

page~identical problem: given a Markov Chain, compute the steady state probabilities p1 ... p5

1 2 3

45

Computing PageRank Iterative procedure Also, … navigate the web by

randomly follow links or with prob p jump to a random page. Let A the adjacency matrix (n x n), di out-degree of page i

Prob(Ai->Aj) = pn-1+(1-p)di–1Aij

A’[i,j] = Prob(Ai->Aj)

google/page-rank algorithm Let A’ be the transition matrix (=

adjacency matrix, row-normalized : sum of each row = 1)

1 2 3

45

1

1/2 1/2

1

1

1/2 1/2

p1

p2

p3

p4

p5

p1

p2

p3

p4

p5

=

google/page-rank algorithm A p = p

1 2 3

45

1

1/2 1/2

1

1

1/2 1/2

p1

p2

p3

p4

p5

p1

p2

p3

p4

p5

=

A p = p

google/page-rank algorithm

A p = p thus, p is the eigenvector that

corresponds to the highest eigenvalue (=1, since the matrix is row-normalized)

Kleinberg/google - conclusions

SVD helps in graph analysis:hub/authority scores: strongest left-

and right- eigenvectors of the adjacency matrix

random walk on a graph: steady state probabilities are given by the strongest eigenvector of the transition matrix

Conclusions – so far SVD: a valuable tool given a document-term matrix, it

finds ‘concepts’ (LSI) ... and can reduce dimensionality

(KL)

Conclusions cont’d ... and can find fixed-points or

steady-state probabilities (google/ Kleinberg/ Markov Chains)

... and can solve optimally over- and under-constraint linear systems (least squares)

References Brin, S. and L. Page (1998). Anatomy of a

Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

J. Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.

Embeddings

Given a metric distance matrix D, embed the objects in a k-dimensional vector space using a mapping F such that D(i,j) is close to D’(F(i),F(j))

Isometric mapping: exact preservation of distance

Contractive mapping: D’(F(i),F(j)) <= D(i,j)

d’ is some Lp measure

PCA

Intuition: find the axis that shows the greatest variation, and project all points into this axis

f1

e1e2

f2

SVD: The mathematical formulation

Normalize the dataset by moving the origin to the center of the dataset

Find the eigenvectors of the data (or covariance) matrix

These define the new space Sort the eigenvalues in

“goodness” order

f1

e1e2

f2

SVD Cont’d

Advantages: Optimal dimensionality reduction (for

linear projections)

Disadvantages: Computationally expensive… but can

be improved with random sampling Sensitive to outliers and non-linearities

FastMap

What if we have a finite metric space (X, d )?Faloutsos and Lin (1995) proposed FastMap

as metric analogue to the KL-transform (PCA). Imagine that the points are in a Euclidean space. Select two pivot points xa and xb that

are far apart. Compute a pseudo-projection of the

remaining points along the “line” xaxb . “Project” the points to an orthogonal

subspace and recurse.

Selecting the Pivot Points

The pivot points should lie along the principal axes, and hence should be far apart. Select any point x0. Let x1 be the furthest from x0. Let x2 be the furthest from x1. Return (x1, x2).

x0

x2

x1

Pseudo-ProjectionsGiven pivots (xa , xb ), for any

third point y, we use the law of cosines to determine the relation of y along xaxb .

The pseudo-projection for y is

This is first coordinate.

xa

xb

y

cy da,y

db,y

da,b

2 2 2 2by ay ab y abd d d c d

2 2 2

2ay ab by

yab

d d dc

d

“Project to orthogonal plane”

Given distances along xaxb

we can compute distances within the “orthogonal hyperplane” using the Pythagorean theorem.

Using d ’(.,.), recurse until k features chosen.

2 2'( ', ') ( , ) ( )z yd y z d y z c c

xb

xa

y

z

y’ z’d’y’,z’

dy,z

cz-cy

Random Projections

Based on the Johnson-Lindenstrauss lemma: For:

0< < 1/2, any (sufficiently large) set S of M points in Rn

k = O(-2lnM) There exists a linear map f:S Rk, such that

(1- ) D(S,T) < D(f(S),f(T)) < (1+ )D(S,T) for S,T in S Random projection is good with constant

probability

Random Projection: Application Set k = O(-2lnM) Select k random n-dimensional vectors

(an approach is to select k gaussian distributed vectors with variance 0 and mean value 1: N(1,0) )

Project the original points into the k vectors. The resulting k-dimensional space

approximately preserves the distances with high probability

Random Projection

A very useful technique, Especially when used in conjunction with

another technique (for example SVD) Use Random projection to reduce the

dimensionality from thousands to hundred, then apply SVD to reduce dimensionality farther

multimedia indexing and dimensionality reduction

Documents

image domain

boolean data

data characteristics

video retrieval

textretrieval techniques

color features

major types of multimedia

search text