multimedia indexing and dimensionality reduction
DESCRIPTION
Multimedia Indexing and Dimensionality Reduction. Multimedia Data Management. The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video tracks) has increased in the recent years. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/1.jpg)
Multimedia Indexing and Dimensionality Reduction
![Page 2: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/2.jpg)
Multimedia Data Management
• The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video tracks) has increased in the recent years.• Joint Research from Database Management, Computer Vision, Signal Processing and Pattern Recognition aims to solve problems related to multimedia data management.
![Page 3: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/3.jpg)
Multimedia Data
• There are four major types of multimedia data: images, video sequences, sound tracks, and text.• From the above, the easiest type to manage is text, since we can order, index, and search text using string management techniques, etc.• Management of simple sounds is also possible by representing audio as signal sequences over different channels.• Image retrieval has received a lot of attention in the last decade (CV and DBs). The main techniques can be extended and applied also for video retrieval.
![Page 4: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/4.jpg)
Content-based Image Retrieval
• Images were traditionally managed by first annotating their contents and then using text-retrieval techniques to index them.• However, with the increase of information in digital image format some drawbacks of this technique were revealed:
• Manual annotation requires vast amount of labor• Different people may perceive differently the contents of an image; thus no objective keywords for search are defined
• A new research field was born in the 90’s: Content-based Image Retrieval aims at indexing and retrieving images based on their visual contents.
![Page 5: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/5.jpg)
Feature Extraction
• The basis of Content-based Image Retrieval is to extract and index some visual features of the images.• There are general features (e.g., color, texture, shape, etc.) and domain-specific features (e.g., objects contained in the image).
• Domain-specific feature extraction can vary with the application domain and is based on pattern recognition• On the other hand, general features can be used independently from the image domain.
![Page 6: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/6.jpg)
Color Features
• To represent the color of an image compactly, a color histogram is used. Colors are partitioned to k groups according to their similarity and the percentage of each group in the image is measured. • Images are transformed to k-dimensional points and a distance metric (e.g., Euclidean distance) is used to measure the similarity between them.
k-bins
k-dimensional space
![Page 7: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/7.jpg)
Using Transformations to Reduce Dimensionality
• In many cases the embedded dimensionality of a search problem is much lower than the actual dimensionality• Some methods apply transformations on the data and approximate them with low-dimensional vectors• The aim is to reduce dimensionality and at the same time maintain the data characteristics• If d(a,b) is the distance between two objects a, b in real (high-dimensional) and d’(a’,b’) is their distance in the transformed low-dimensional space, we want d’(a’,b’)d(a,b).
d(a,b)
d’(a’,b’)
![Page 8: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/8.jpg)
Problem - Motivation
Given a database of documents, find documents containing “data”, “retrieval”
Applications: Web law + patent offices digital libraries information filtering
![Page 9: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/9.jpg)
Types of queries: boolean (‘data’ AND ‘retrieval’ AND NOT ...) additional features (‘data’ ADJACENT
‘retrieval’) keyword queries (‘data’, ‘retrieval’)
How to search a large collection of documents?
Problem - Motivation
![Page 10: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/10.jpg)
Text – Inverted Files
![Page 11: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/11.jpg)
Q: space overhead?
Text – Inverted Files
A: mainly, the postings lists
![Page 12: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/12.jpg)
how to organize dictionary?
stemming – Y/N? Keep only the root of each word ex.
inverted, inversion invert insertions?
Text – Inverted Files
![Page 13: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/13.jpg)
how to organize dictionary? B-tree, hashing, TRIEs, PATRICIA trees, ...
stemming – Y/N? insertions?
Text – Inverted Files
![Page 14: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/14.jpg)
postings list – more Zipf distr.: eg., rank-frequency plot of ‘Bible’
log(rank)
log(freq) freq ~ 1/rank /
ln(1.78V)
Text – Inverted Files
![Page 15: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/15.jpg)
postings lists Cutting+Pedersen
(keep first 4 in B-tree leaves) how to allocate space: [Faloutsos+92]
geometric progression compression (Elias codes) [Zobel+] – down
to 2% overhead! Conclusions: needs space overhead (2%-300%), but
it is the fastest
Text – Inverted Files
![Page 16: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/16.jpg)
Text - Detailed outline
Text databases problem full text scanning inversion signature files (a.k.a. Bloom Filters) Vector model and clustering information filtering and LSI
![Page 17: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/17.jpg)
Vector Space Model and Clustering
Keyword (free-text) queries (vs Boolean) each document: -> vector (HOW?) each query: -> vector search for ‘similar’ vectors
![Page 18: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/18.jpg)
Vector Space Model and Clustering
main idea: each document is a vector of size d: d is the number of different terms in the database
document
...data...
aaron zoodata
d (= vocabulary size)
‘indexing’
![Page 19: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/19.jpg)
Document Vectors
Documents are represented as “bags of words”
OR as vectors. A vector is like an array of floating points Has direction and magnitude Each vector holds a place for every term
in the collection Therefore, most vectors are sparse
![Page 20: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/20.jpg)
Document VectorsOne location for each word.
nova galaxy heat h’wood film rolediet fur
10 5 3 5 10
10 8 7 9 10 5
10 10 9 10
5 7 9 6 10 2 8
7 5 1 3
ABCDEFGHI
“Nova” occurs 10 times in text A“Galaxy” occurs 5 times in text A“Heat” occurs 3 times in text A(Blank means 0 occurrences.)
![Page 21: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/21.jpg)
Document VectorsOne location for each word.
nova galaxy heat h’wood film rolediet fur
10 5 3 5 10
10 8 7 9 10 5
10 10 9 10
5 7 9 6 10 2 8
7 5 1 3
ABCDEFGHI
“Hollywood” occurs 7 times in text I“Film” occurs 5 times in text I“Diet” occurs 1 time in text I“Fur” occurs 3 times in text I
![Page 22: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/22.jpg)
Document Vectors
nova galaxy heat h’wood film rolediet fur
10 5 3 5 10
10 8 7 9 10 5
10 10 9 10
5 7 9 6 10 2 8
7 5 1 3
ABCDEFGHI
Document ids
![Page 23: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/23.jpg)
We Can Plot the Vectors
Star
Diet
Doc about astronomyDoc about movie stars
Doc about mammal behavior
![Page 24: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/24.jpg)
Assigning Weights to Terms
Binary Weights Raw term frequency tf x idf
Recall the Zipf distribution Want to weight terms highly if they are
frequent in relevant documents … BUT infrequent in the collection as a whole
![Page 25: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/25.jpg)
Binary Weights
Only the presence (1) or absence (0) of a term is included in the vector
docs t1 t2 t3D1 1 0 1D2 1 0 0D3 0 1 1D4 1 0 0D5 1 1 1D6 1 1 0D7 0 1 0D8 0 1 0D9 0 0 1
D10 0 1 1D11 1 0 1
![Page 26: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/26.jpg)
Raw Term Weights
The frequency of occurrence for the term in each document is included in the vector
docs t1 t2 t3D1 2 0 3D2 1 0 0D3 0 4 7D4 3 0 0D5 1 6 3D6 3 5 0D7 0 8 0D8 0 10 0D9 0 0 1
D10 0 3 5D11 4 0 1
![Page 27: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/27.jpg)
Assigning Weights
tf x idf measure: term frequency (tf) inverse document frequency (idf) -- a way to
deal with the problems of the Zipf distribution
Goal: assign a tf * idf weight to each term in each document
![Page 28: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/28.jpg)
tf x idf
)/log(* kikik nNtfw
log
Tcontain that in documents ofnumber the
collection in the documents ofnumber total
in T termoffrequency document inverse
document in T termoffrequency
term
nNidf
Cn
CN
Cidf
Dtf
kT
kk
kk
kk
ikik
k
![Page 29: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/29.jpg)
Inverse Document Frequency
IDF provides high values for rare words and low values for common words
41
10000log
698.220
10000log
301.05000
10000log
010000
10000log
For a collectionof 10000 documents
![Page 30: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/30.jpg)
Similarity Measures for document vectors
|)||,min(|
||
||||
||
||||
||||
||2
||
21
21
DQ
DQ
DQ
DQ
DQDQ
DQ
DQ
DQ
Simple matching (coordination level match)
Dice’s Coefficient
Jaccard’s Coefficient
Cosine Coefficient
Overlap Coefficient
![Page 31: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/31.jpg)
tf x idf normalization
Normalize the term weights (so longer documents are not unfairly given more weight)
normalize usually means force all values to fall within a certain range, usually between 0 and 1, inclusive.
t
k kik
kikik
nNtf
nNtfw
1
22 )]/[log()(
)/log(
![Page 32: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/32.jpg)
Vector space similarity(use the weights to compare the documents)
product.inner normalizedor cosine, thecalled also is This
),(
:is documents twoof similarity theNow,
1
t
kjkikji wwDDsim
![Page 33: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/33.jpg)
Computing Similarity Scores
2
1 1D
Q2D
98.0cos
74.0cos
)8.0 ,4.0(
)7.0 ,2.0(
)3.0 ,8.0(
2
1
2
1
Q
D
D
1.0
0.8
0.6
0.8
0.4
0.60.4 1.00.2
0.2
![Page 34: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/34.jpg)
Vector Space with Term Weights and Cosine Matching
1.0
0.8
0.6
0.4
0.2
0.80.60.40.20 1.0
D2
D1
Q
1
2
Term B
Term A
Di=(di1,wdi1;di2, wdi2;…;dit, wdit)Q =(qi1,wqi1;qi2, wqi2;…;qit, wqit)
t
j
t
j dq
t
j dq
i
ijj
ijj
ww
wwDQsim
1 1
22
1
)()(),(
Q = (0.4,0.8)D1=(0.8,0.3)D2=(0.2,0.7)
98.042.0
64.0
])7.0()2.0[(])8.0()4.0[(
)7.08.0()2.04.0()2,(
2222
DQsim
74.058.0
56.),( 1 DQsim
![Page 35: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/35.jpg)
Text - Detailed outline
Text databases problem full text scanning inversion signature files (a.k.a. Bloom Filters) Vector model and clustering information filtering and LSI
![Page 36: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/36.jpg)
Information Filtering + LSI [Foltz+,’92] Goal:
users specify interests (= keywords) system alerts them, on suitable news-
documents Major contribution: LSI = Latent
Semantic Indexing latent (‘hidden’) concepts
![Page 37: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/37.jpg)
Information Filtering + LSI
Main idea map each document into some
‘concepts’ map each term into some ‘concepts’
‘Concept’:~ a set of terms, with weights, e.g. “data” (0.8), “system” (0.5), “retrieval”
(0.6) -> DBMS_concept
![Page 38: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/38.jpg)
Information Filtering + LSI
Pictorially: term-document matrix (BEFORE)
'data' 'system' 'retrieval' 'lung' 'ear'
TR1 1 1 1
TR2 1 1 1
TR3 1 1
TR4 1 1
![Page 39: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/39.jpg)
Information Filtering + LSIPictorially: concept-document matrix
and...'DBMS-concept'
'medical-concept'
TR1 1
TR2 1
TR3 1
TR4 1
![Page 40: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/40.jpg)
Information Filtering + LSI... and concept-term matrix
'DBMS-concept'
'medical-concept'
data 1
system 1
retrieval 1
lung 1
ear 1
![Page 41: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/41.jpg)
Information Filtering + LSI
Q: How to search, eg., for ‘system’?
![Page 42: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/42.jpg)
Information Filtering + LSIA: find the corresponding concept(s);
and the corresponding documents
'DBMS-concept'
'medical-concept'
data 1
system 1
retrieval 1
lung 1
ear 1
'DBMS-concept'
'medical-concept'
TR1 1
TR2 1
TR3 1
TR4 1
![Page 43: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/43.jpg)
Information Filtering + LSIA: find the corresponding concept(s);
and the corresponding documents
'DBMS-concept'
'medical-concept'
data 1
system 1
retrieval 1
lung 1
ear 1
'DBMS-concept'
'medical-concept'
TR1 1
TR2 1
TR3 1
TR4 1
![Page 44: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/44.jpg)
Information Filtering + LSI
Thus it works like an (automatically constructed) thesaurus:
we may retrieve documents that DON’T have the term ‘system’, but they contain almost everything else (‘data’, ‘retrieval’)
![Page 45: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/45.jpg)
SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties
![Page 46: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/46.jpg)
SVD - Motivation problem #1: text - LSI: find
‘concepts’ problem #2: compression / dim.
reduction
![Page 47: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/47.jpg)
SVD - Motivation problem #1: text - LSI: find
‘concepts’
![Page 48: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/48.jpg)
SVD - Motivation problem #2: compress / reduce
dimensionality
![Page 49: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/49.jpg)
Problem - specs ~10**6 rows; ~10**3 columns; no updates; random access to any cell(s) ; small error: OK
![Page 50: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/50.jpg)
SVD - Motivation
![Page 51: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/51.jpg)
SVD - Motivation
![Page 52: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/52.jpg)
SVD - Definition
A[n x m] = U[n x r] r x r] (V[m x r])T
A: n x m matrix (eg., n documents, m terms)
U: n x r matrix (n documents, r concepts)
: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix)
V: m x r matrix (m terms, r concepts)
![Page 53: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/53.jpg)
SVD - Properties
THEOREM [Press+92]: always possible to decompose matrix A into A = U VT , where
U, V: unique (*) U, V: column orthonormal (ie., columns
are unit vectors, orthogonal to each other) UT U = I; VT V = I (I: identity matrix)
: eigenvalues are positive, and sorted in decreasing order
![Page 54: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/54.jpg)
SVD - Example A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
datainf.
retrieval
brain lung
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=CS
MD
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
![Page 55: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/55.jpg)
SVD - Example A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
datainf.
retrieval
brain lung
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=CS
MD
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
CS-conceptMD-concept
![Page 56: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/56.jpg)
SVD - Example A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
datainf.
retrieval
brain lung
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=CS
MD
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
CS-conceptMD-concept
doc-to-concept similarity matrix
![Page 57: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/57.jpg)
SVD - Example A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
datainf.
retrieval
brain lung
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=CS
MD
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
‘strength’ of CS-concept
![Page 58: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/58.jpg)
SVD - Example A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
datainf.
retrieval
brain lung
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=CS
MD
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
term-to-conceptsimilarity matrix
CS-concept
![Page 59: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/59.jpg)
SVD - Example A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
datainf.
retrieval
brain lung
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=CS
MD
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
term-to-conceptsimilarity matrix
CS-concept
![Page 60: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/60.jpg)
SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties
![Page 61: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/61.jpg)
SVD - Interpretation #1
‘documents’, ‘terms’ and ‘concepts’: U: document-to-concept similarity
matrix V: term-to-concept sim. matrix : its diagonal elements:
‘strength’ of each concept
![Page 62: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/62.jpg)
SVD - Interpretation #2 best axis to project on: (‘best’ =
min sum of squares of projection errors)
![Page 63: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/63.jpg)
SVD - Motivation
![Page 64: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/64.jpg)
SVD - interpretation #2
minimum RMS error
SVD: givesbest axis to project
v1
![Page 65: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/65.jpg)
SVD - Interpretation #2
![Page 66: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/66.jpg)
SVD - Interpretation #2
A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
v1
![Page 67: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/67.jpg)
SVD - Interpretation #2 A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
variance (‘spread’) on the v1 axis
![Page 68: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/68.jpg)
SVD - Interpretation #2 A = U VT - example:
U gives the coordinates of the points in the projection axis
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
![Page 69: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/69.jpg)
SVD - Interpretation #2 More details Q: how exactly is dim. reduction
done?1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
![Page 70: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/70.jpg)
SVD - Interpretation #2 More details Q: how exactly is dim. reduction
done? A: set the smallest eigenvalues to
zero:1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
![Page 71: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/71.jpg)
SVD - Interpretation #2
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
~9.64 0
0 0x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
![Page 72: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/72.jpg)
SVD - Interpretation #2
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
~9.64 0
0 0x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
![Page 73: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/73.jpg)
SVD - Interpretation #2
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18
0.36
0.18
0.90
0
00
~9.64
x
0.58 0.58 0.58 0 0
x
![Page 74: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/74.jpg)
SVD - Interpretation #2
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
~
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 0 0
0 0 0 0 00 0 0 0 0
![Page 75: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/75.jpg)
SVD - Interpretation #2
Equivalent:‘spectral decomposition’ of the
matrix:1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
![Page 76: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/76.jpg)
SVD - Interpretation #2
Equivalent:‘spectral decomposition’ of the
matrix:1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
= x xu1 u2
1
2
v1
v2
![Page 77: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/77.jpg)
SVD - Interpretation #2Equivalent:‘spectral decomposition’ of the
matrix:1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
= u11 vT1 u22 vT
2+ +...n
m
r
i
Tiii vu
1
![Page 78: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/78.jpg)
SVD - Interpretation #2
‘spectral decomposition’ of the matrix:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
= u11 vT1 u22 vT
2+ +...n
m
n x 1 1 x m
r terms
![Page 79: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/79.jpg)
SVD - Interpretation #2
approximation / dim. reduction:by keeping the first few terms (Q: how
many?)1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
= u11 vT1 u22 vT
2+ +...n
m
assume: 1 >= 2 >= ...
To do the mapping you use VT
X’ = VT X
![Page 80: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/80.jpg)
SVD - Interpretation #2
A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of i ’s)
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
= u11 vT1 u22 vT
2+ +...n
m
assume: 1 >= 2 >= ...
![Page 81: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/81.jpg)
SVD - Interpretation #3
finds non-zero ‘blobs’ in a data matrix
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
![Page 82: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/82.jpg)
SVD - Interpretation #3 finds non-zero ‘blobs’ in a data
matrix
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
![Page 83: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/83.jpg)
SVD - Interpretation #3 Drill: find the SVD, ‘by inspection’! Q: rank = ??
1 1 1 0 0
1 1 1 0 0
1 1 1 0 0
0 0 0 1 1
0 0 0 1 1
= x x?? ??
??
![Page 84: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/84.jpg)
SVD - Interpretation #3 A: rank = 2 (2 linearly independent
rows/cols)
1 1 1 0 0
1 1 1 0 0
1 1 1 0 0
0 0 0 1 1
0 0 0 1 1
= x x??
??
?? 0
0 ??
??
??
![Page 85: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/85.jpg)
SVD - Interpretation #3 A: rank = 2 (2 linearly independent
rows/cols)
1 1 1 0 0
1 1 1 0 0
1 1 1 0 0
0 0 0 1 1
0 0 0 1 1
= x x?? 0
0 ??
1 0
1 0
1 0
0 1
0 11 1 1 0 0
0 0 0 1 1
orthogonal??
![Page 86: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/86.jpg)
SVD - Interpretation #3 column vectors: are orthogonal -
but not unit vectors:
3
1
1 1 1 0 0
1 1 1 0 0
1 1 1 0 0
0 0 0 1 1
0 0 0 1 1
= x x?? 0
0 ??
3
1
3
1
3
1
3
1
3
1
00
000
2
1
2
1
0 0
0 0 0 2
1
2
1
![Page 87: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/87.jpg)
SVD - Interpretation #3 and the eigenvalues are:
1 1 1 0 0
1 1 1 0 0
1 1 1 0 0
0 0 0 1 1
0 0 0 1 1
= x x3 0
0 2
3
1
3
1
3
1
00
000
2
1
2
1
3
1
3
1
3
10 0
0 0 0 2
1
2
1
![Page 88: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/88.jpg)
SVD - Interpretation #3 A: SVD properties:
matrix product should give back matrix A
matrix U should be column-orthonormal, i.e., columns should be unit vectors, orthogonal to each other
ditto for matrix V matrix should be diagonal, with
positive values
![Page 89: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/89.jpg)
SVD - Complexity O( n * m * m) or O( n * n * m)
(whichever is less) less work, if we just want
eigenvalues or if we want first k eigenvectors or if the matrix is sparse [Berry] Implemented: in any linear algebra
package (LINPACK, matlab, Splus, mathematica ...)
![Page 90: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/90.jpg)
Optimality of SVD
Def: The Frobenius norm of a n x m matrix M is
(reminder) The rank of a matrix M is the number of independent rows (or columns) of M
Let A=UVT and Ak = Uk k VkT (SVD approximation of
A) Ak is an nxm matrix, Uk an nxk, k kxk, and Vk mxk
Theorem: [Eckart and Young] Among all n x m matrices C of rank at most k, we have that:
2],[ jiMMF
FFk CAAA
![Page 91: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/91.jpg)
Kleinberg’s Algorithm Main idea: In many cases, when you
search the web using some terms, the most relevant pages may not contain this term (or contain the term only a few times) Harvard : www.harvard.edu Search Engines: yahoo, google, altavista
Authorities and hubs
![Page 92: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/92.jpg)
Kleinberg’s algorithm Problem dfn: given the web and a query find the most ‘authoritative’ web pages
for this query
Step 0: find all pages containing the query terms (root set)
Step 1: expand by one move forward and backward (base set)
![Page 93: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/93.jpg)
Kleinberg’s algorithm Step 1: expand by one move
forward and backward
![Page 94: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/94.jpg)
Kleinberg’s algorithm on the resulting graph, give high
score (= ‘authorities’) to nodes that many important nodes point to
give high importance score (‘hubs’) to nodes that point to good ‘authorities’)
hubs authorities
![Page 95: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/95.jpg)
Kleinberg’s algorithm
observations recursive definition! each node (say, ‘i’-th node) has
both an authoritativeness score ai and a hubness score hi
![Page 96: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/96.jpg)
Kleinberg’s algorithm
Let E be the set of edges and A be the adjacency matrix: the (i,j) is 1 if the edge from i to j exists
Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores.
Then:
![Page 97: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/97.jpg)
Kleinberg’s algorithm
Then:ai = hk + hl + hm
that isai = Sum (hj) over all j
that (j,i) edge existsora = AT h
k
l
m
i
![Page 98: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/98.jpg)
Kleinberg’s algorithm
symmetrically, for the ‘hubness’:
hi = an + ap + aq
that ishi = Sum (qj) over all j
that (i,j) edge existsorh = A a
p
n
q
i
![Page 99: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/99.jpg)
Kleinberg’s algorithm
In conclusion, we want vectors h and a such that:
h = A aa = AT h
Recall properties:C(2): A [n x m] v1 [m x 1] = 1 u1 [n x 1]
C(3): u1T A = 1 v1
T
![Page 100: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/100.jpg)
Kleinberg’s algorithmIn short, the solutions to
h = A aa = AT h
are the left- and right- eigenvectors of the adjacency matrix A.
Starting from random a’ and iterating, we’ll eventually converge
(Q: to which of all the eigenvectors? why?)
![Page 101: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/101.jpg)
Kleinberg’s algorithm
(Q: to which of all the eigenvectors? why?)
A: to the ones of the strongest eigenvalue, because of property B(5):B(5): (AT
A ) k v’ ~ (constant) v1
![Page 102: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/102.jpg)
Kleinberg’s algorithm - results
Eg., for the query ‘java’:0.328 www.gamelan.com0.251 java.sun.com0.190 www.digitalfocus.com (“the
java developer”)
![Page 103: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/103.jpg)
Kleinberg’s algorithm - discussion
‘authority’ score can be used to find ‘similar pages’ to page p
closely related to ‘citation analysis’, social networs / ‘small world’ phenomena
![Page 104: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/104.jpg)
google/page-rank algorithm
closely related: The Web is a directed graph of connected nodes
imagine a particle randomly moving along the edges (*)
compute its steady-state probabilities. That gives the PageRank of each pages (the importance of this page)
(*) with occasional random jumps
![Page 105: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/105.jpg)
PageRank Definition Assume a page A and pages T1, T2,
…, Tm that point to A. Let d is a damping factor. PR(A) the pagerank of A. C(A) the out-degree of A. Then:
))(
)(...
)2(
)2(
)1(
)1(()1()(
TmC
TmPR
TC
TPR
TC
TPRddAPR
![Page 106: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/106.jpg)
google/page-rank algorithm Compute the PR of each
page~identical problem: given a Markov Chain, compute the steady state probabilities p1 ... p5
1 2 3
45
![Page 107: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/107.jpg)
Computing PageRank Iterative procedure Also, … navigate the web by
randomly follow links or with prob p jump to a random page. Let A the adjacency matrix (n x n), di out-degree of page i
Prob(Ai->Aj) = pn-1+(1-p)di–1Aij
A’[i,j] = Prob(Ai->Aj)
![Page 108: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/108.jpg)
google/page-rank algorithm Let A’ be the transition matrix (=
adjacency matrix, row-normalized : sum of each row = 1)
1 2 3
45
1
1/2 1/2
1
1
1/2 1/2
p1
p2
p3
p4
p5
p1
p2
p3
p4
p5
=
![Page 109: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/109.jpg)
google/page-rank algorithm A p = p
1 2 3
45
1
1/2 1/2
1
1
1/2 1/2
p1
p2
p3
p4
p5
p1
p2
p3
p4
p5
=
A p = p
![Page 110: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/110.jpg)
google/page-rank algorithm
A p = p thus, p is the eigenvector that
corresponds to the highest eigenvalue (=1, since the matrix is row-normalized)
![Page 111: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/111.jpg)
Kleinberg/google - conclusions
SVD helps in graph analysis:hub/authority scores: strongest left-
and right- eigenvectors of the adjacency matrix
random walk on a graph: steady state probabilities are given by the strongest eigenvector of the transition matrix
![Page 112: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/112.jpg)
Conclusions – so far SVD: a valuable tool given a document-term matrix, it
finds ‘concepts’ (LSI) ... and can reduce dimensionality
(KL)
![Page 113: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/113.jpg)
Conclusions cont’d ... and can find fixed-points or
steady-state probabilities (google/ Kleinberg/ Markov Chains)
... and can solve optimally over- and under-constraint linear systems (least squares)
![Page 114: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/114.jpg)
References Brin, S. and L. Page (1998). Anatomy of a
Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.
J. Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.
![Page 115: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/115.jpg)
Embeddings
Given a metric distance matrix D, embed the objects in a k-dimensional vector space using a mapping F such that D(i,j) is close to D’(F(i),F(j))
Isometric mapping: exact preservation of distance
Contractive mapping: D’(F(i),F(j)) <= D(i,j)
d’ is some Lp measure
![Page 116: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/116.jpg)
PCA
Intuition: find the axis that shows the greatest variation, and project all points into this axis
f1
e1e2
f2
![Page 117: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/117.jpg)
SVD: The mathematical formulation
Normalize the dataset by moving the origin to the center of the dataset
Find the eigenvectors of the data (or covariance) matrix
These define the new space Sort the eigenvalues in
“goodness” order
f1
e1e2
f2
![Page 118: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/118.jpg)
SVD Cont’d
Advantages: Optimal dimensionality reduction (for
linear projections)
Disadvantages: Computationally expensive… but can
be improved with random sampling Sensitive to outliers and non-linearities
![Page 119: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/119.jpg)
FastMap
What if we have a finite metric space (X, d )?Faloutsos and Lin (1995) proposed FastMap
as metric analogue to the KL-transform (PCA). Imagine that the points are in a Euclidean space. Select two pivot points xa and xb that
are far apart. Compute a pseudo-projection of the
remaining points along the “line” xaxb . “Project” the points to an orthogonal
subspace and recurse.
![Page 120: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/120.jpg)
Selecting the Pivot Points
The pivot points should lie along the principal axes, and hence should be far apart. Select any point x0. Let x1 be the furthest from x0. Let x2 be the furthest from x1. Return (x1, x2).
x0
x2
x1
![Page 121: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/121.jpg)
Pseudo-ProjectionsGiven pivots (xa , xb ), for any
third point y, we use the law of cosines to determine the relation of y along xaxb .
The pseudo-projection for y is
This is first coordinate.
xa
xb
y
cy da,y
db,y
da,b
2 2 2 2by ay ab y abd d d c d
2 2 2
2ay ab by
yab
d d dc
d
![Page 122: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/122.jpg)
“Project to orthogonal plane”
Given distances along xaxb
we can compute distances within the “orthogonal hyperplane” using the Pythagorean theorem.
Using d ’(.,.), recurse until k features chosen.
2 2'( ', ') ( , ) ( )z yd y z d y z c c
xb
xa
y
z
y’ z’d’y’,z’
dy,z
cz-cy
![Page 123: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/123.jpg)
Random Projections
Based on the Johnson-Lindenstrauss lemma: For:
0< < 1/2, any (sufficiently large) set S of M points in Rn
k = O(-2lnM) There exists a linear map f:S Rk, such that
(1- ) D(S,T) < D(f(S),f(T)) < (1+ )D(S,T) for S,T in S Random projection is good with constant
probability
![Page 124: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/124.jpg)
Random Projection: Application Set k = O(-2lnM) Select k random n-dimensional vectors
(an approach is to select k gaussian distributed vectors with variance 0 and mean value 1: N(1,0) )
Project the original points into the k vectors. The resulting k-dimensional space
approximately preserves the distances with high probability
![Page 125: Multimedia Indexing and Dimensionality Reduction](https://reader035.vdocuments.site/reader035/viewer/2022081603/56815147550346895dbf68c1/html5/thumbnails/125.jpg)
Random Projection
A very useful technique, Especially when used in conjunction with
another technique (for example SVD) Use Random projection to reduce the
dimensionality from thousands to hundred, then apply SVD to reduce dimensionality farther