simrank : a measure of structural-context similarity
DESCRIPTION
SimRank : A Measure of Structural-Context Similarity. Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom. Outline. Motivation Objective Introduction Basic Graph Model SimRank Random Surfer-Pairs Model Future Work Personal opinion. - PowerPoint PPT PresentationTRANSCRIPT
SimRank : A Measure of Structural-Context Similarity
Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang
Author : Glen JehJennifer Widom
Outline
Motivation Objective Introduction Basic Graph Model SimRank Random Surfer-Pairs Model Future Work Personal opinion
Motivation
The problem of measuring “similarity” of objects arises in many applications.
Objective The approach, applicable in any
domain with object-to-object relationships.
Two objects are similar if they are related to similar objects.
Introduction
Basic Graph Model We model objects and
relationships as a directed graph G=(V,E).
For a node v in a graph, we denote by I(v) and O(v) the set of in-neighbors and out-neighbors.
SimRank Basic SimRank Equation
If a=b then s(a,b) is defined to be 1. Otherwise,
Where C is a constant between 0 and 1. Set s(a,b)=0 when or .
)|(|
1
)|(|
1
))(),((|)(||)(|
),(aI
i
bI
jji bIaIs
bIaI
Cbas (1)
)(aI )(bI
Bipartite SimRank Two types of objects. Example : Shopping graph G.
SimRank
SimRank
Let s(A,B) denote the similarity between persons A and B, for
Let s(c,d) denote the similarity between items c and d, for
SimRank
BA
dc
)|(|
1
)|(|
1
1 ))(),((|)(||)(|
),(AO
i
BO
jji BOAOs
BOAO
CBAs (2)
)|(|
1
)|(|
1
2 ))(),((|)(||)(|
),(cI
i
dI
jji dIcIs
dIcI
Cdcs (3)
Computing SimRank - Naive Method is a lower bound on the .
To compute from
SimRank
010 {),( baR
(if )
(if )
ba ba
),(1 baRk (*,*)kR
)|(|
1
)|(|
11 ))(),((
|)(||)(|),(
aI
i
bI
jjikk bIaIR
bIaI
CbaR (4)
ba ba 1),(1 baRkFor , and for .
),(0 baR ),( bas
The space required is simply to store the results .
The time required is . K:The number of iterations :The average of |I(a)||I(b)| over all
node pairs (a,b).
SimRank)( 2nO
kR
)( 22dKnO
2d
Computing SimRank - Pruning set the similarity between two nodes far
apart to be 0. consider node-pairs only for nodes which
are near each other.
SimRank
Radius r, and average such neighbors for a node, then there will be node-pairs.
The time and space complexities become and respectively.
SimRank
)( rndO)( 2dKndO r
rd
rnd
Random Surfer-Pair Model Expected Distance
Let H be any strongly connected graph.
Let u,v be any two nodes in H. We define the expected distance
d(u,v) from u to v as
vut
tltPvud:
)(][),( (5)
Expected Meeting Distance(EMD).
Random Surfer-Pair Model
),(),(:
)(][),(xxbat
tltPbam (6)
Expected-f Meeting Distance To circumvent the “infinite EMD”
problem. To map all distances to a finite
interval. Exponential function ,where
is a constant.
Random Surfer-Pair Model
),(),(:
)(][),(xxbat
tlctPbas (7)
zczf )(
)1,0(c
Equivalence to SimRank
Random Surfer-Pair Model
Theorem. The SimRank score, with parameter C, be
tween two nodes is their expected-f meeting distance traveling back-edges,for .
Random Surfer-Pair Model
zczf )(
Future Work
Future Work. Divided and conquer and merge.
Divided a corpus into chunks… Ternary(or more) relationships.
Personal Opinion
We believe that the intuition behind SimRank can be used in many domains which based on objects to objects.