ecs289m spring, 2008 social network models s. felix wu computer science department university of...

ecs289m Spring, 2008

Social Network Models

S. Felix WuComputer Science DepartmentUniversity of California, Davis

[email protected]://www.cs.ucdavis.edu/~wu/

mailto:[email protected]

SOURCE: Brandes, Raab and Wagner (2001)

<http://www.inf.uni-konstanz.de/~brandes/publications/brw-envsd-01.pdf>

Organization Chart

Activities of Actual Advice Seeking

Who is the most power?Can you determine that for OSN?

Real Social Organization

OECD Trade Flows 1981-1992

SOURCE: Lothar Krempel http://www.mpi-fg-koeln.mpg.de/~lk/netvis.html

9-11 Hijackers Network

SOURCE: Valdis Krebs http://www.orgnet.com/

03/14/2008 Davis Social Links 7

The Web ???

Social Network Analysis

“Structural relationships” as explanations:

• Network

• Formation

• Influence and collective actions


Social Network Analysis

1. Degree Centrality: The number of direct connections a node has. What really matters is where those connections lead to and how they connect the otherwise unconnected.

2. Betweenness Centrality: A node with high betweenness has great influence over what flows in the network indicating important links and single point of failure.

3. Closeness Centrality: The measure of closeness of a node which are close to everyone else. The pattern of the direct and indirect ties allows the nodes any other node in the network more quickly than anyone else. They have the shortest paths to all others.

4. Eigenvector Centrality: It assigns relative scores to all nodes in the network based on the principle that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes.


Random Graphs

• G(n, p): n nodes and each edge with prob p


Random Graphs

• G(n, p): n nodes and each edge with prob p

• When p < 1/n, disconnected components

• When p is sufficiently large, 1 giant component

• How about diameter?– The maximum distance (in hops) between

any two nodes.


Random Graph (Erdos/Renyi)

• Probabilistically, each node has (N-1)p direct neighbors ~ Z

• ZD = N (D is the diameter)• D = logN / logZ

• In two hops, each node will have Z2 neighbors in (equal) probability?


Small World Model

• Low Diameter– Logarithmic or poly-logarithmic to N

• “High” Cluster Coefficient– cluster coefficient: the portion of X’s

neighbors directly connecting to one of X’s other neighbors


Cluster Coefficient

• Mesh network: Ccluster = 1

• Lattice Network (with degree K): Ccluster = 0– E.g., a linear line

• How about Ccluster for Random Graph?


Re-wiring (Watts/Strogatz)

Trade off between D and Ccluster !

Structured/Clustered


A Cycle plus a Random Matching

• A dual combinatorial problem:– For given integers n and k, find a graph on n

vertices with maximum degree k.

– For givens integers k and D, find a graph, with bounded degree k and diameter at most D, having as many vertices as possible.

• How?


A Cycle plus a Random Matching

• Cycle & Random “disjointed” match

Bollobas/Chung: (logN) < D(G) < (logN + loglogN)


Degree Centrality

• Degree distribution and the expected number of neighbors– Random graph (Poisson Distribution)

• Lower-law Tail for real world networks– P(k) ~ k-r

– Scale-free: invariant to the size of N


Exponential Distribution


Power Law (function or dist.)

f(x) = axk + o(xk)f(cx) = ?


Zipf’s law

• Discrete Power-Law• Ranking in the frequency table

– {“the” (7%), “of” (3.5%), “and”, …}

• f(k;s,N) = k-s/(sum[n=1-N] n-s)


Re-wiring (Watts/Strogatz)

Trade off between D and Ccluster !

Structured/Clustered


Two Issues about Low Diameters

• Why should there exist short chains of acquaintances linking together arbitrary pairs of strangers?

• Why should arbitrary pairs of strangers be able to find the short chains of acquaintances that link them together?


Kleinberg’s Basic setting


p, q, r

• p: lattice distance between one node and all its local neighbors

• q: number of long range contacts• r: inverse probability [d(u,v)]-r

– What is the intuition about r?– What about r = 0


Kleinberg’s results

A decentralized routing problem– For nodes s,t with known lattice

coordinates, find a short path from s to t. – At any step, can only use local

information, – Kleinberg suggests a simple greedy

algorithm and analyzes it:


Local Information

• Local contacts• Coordinate for the target• The locations and long-range contacts

of all nodes that have come in contact with the message.


Results

• If r = 0, expected delivery time is at least a0n2/3.– Lower bound

• If r = 2, p = q = 1, a2(log n)2

– Martel/Nguyen’s newer results

• 0 <= r < 2 ~ arn(2-r)/3

• r > 2 ~ arn(r-2)(r-1)


Skip Lists

• The basic idea:

• Keep a doubly-linked list of elements– Min, max, successor, predecessor: O(1) time– Delete is O(1) time, Insert is O(1)+Search time

• During insert, add each level-i element to level i+1 with probability p (e.g., p = 1/2 or p = 1/4)

level 1

3 9 12 18 29 35 37

level 2

level 3


Skip Graphs

• Based on “skip list”: – A randomized balanced tree structure organized as a

tower of increasingly sparse linked lists– All nodes join the link list of level 0– For other levels, each node joins with a fixed

probability p– Each node has 2/(1-p) pointers– Average search time: O(log(n/((1-p)*log1/p)))


Skip Graph:

• Skip List is not suitable for P2P environment– No redundancy, Hotspot problem– Vulnerable to failure and contention

• Skip Graph: Extension of Skip List– Level 0 link list builds a Chord ring– Multiple (max 2i) lists for level i (i = 1, … logn)– Each node participate in all levels, but different lists– Membership vector m(x): decide which list to join– Every node sees its own skip list


Degree Optimal P2P Routing

• Different routing schemes– Viceroy [MNR02]: emulates the butterfly network

• Constant degree• O(log n) hops for routing

– Constructions emulating De-Bruijn graphs• Can achieve any degree/number of hops tradeoff

– In particular degree O(log n) and O(log n/ log log n) hops

• Routing is not greedy– Recent construction [AM] fixes that.

• Even if target and source are close in label space message might be routed away

• No (natural) prefix search– Random keys are necessary.


Skip – Graphs [AS02],[HDJ+03]

• Each node (resource) has a name.• Nodes are arranged on a line sorted by name.

• Each node chooses a random string of bits.• An edge is established if two nodes share a prefix which is

not shared by the nodes between them.• Allows prefix search.

0 1 110011

1 1100 00

0 1 0

a b c fed


Routing in Skip – Graphs

• Greedy Routing – use longest edge possible.• Path length is (log n) w.h.p.

• The NoN algorithm optimizes over two hops.

0 1 110011

1 1100 00

0 1 0

Theorem: Using the NoN algorithm, the expected path length of any lookup is .


Kleinberg’s Lattice Model

• Graph embedded in a metric space (e.g., 2D lattice)

• “Search efficiently” using only Local information + long range contact(s)– ~ inverse probability [d(u,v)]-r

– r = 2, a special case


Some Extensions

• Hierarchical Network Models• Group Structure Models• Constant Number of Out-Links

“Small World Phenomena and the Dynamics of Information” by J. Kleinberg, NIPS, 2001


Generation & Search

• There is a data structure behind and among all the social peers– Lattice, Tree, Group/Community

• The link probability depends on this “social data structure”– And, using it to generate the social network

• Searching may use “direct contacts” plus the knowledge about the social data structure


Hierarchical Network Models

• Representation– a complete b-ary tree, T– All social nodes are “leaves”

• Distance and Link Probability– = the height of the least common

ancestor of v and w in T– probability proportional– normalization in probability

– out-degree in graph

€

f (h(v,w))

f (h(v,x))x≠v

∑€

f (h(v,w))€

h(v,w)

€

k = c log2 n


the Critical Value

€

h →∞lim

f (h)

b− ′ α h= 0,∀ ′ α < α

€

h →∞lim

b− ′ ′ α h

f (h)= 0,∀ ′ ′ α > α

€

f (h(v,w)) ~ b−αh(v,w )


Interpretation (1)

• /Science/Computer_Science/Algorithms

• /Arts/Music/Opera

• /Science/Computer_Science/Machine_Learning


Interpretation (2)

• Target: “stock broker @ Boston, MA”

• Next hop:– “bishop @ Cambridge, MA”– “banker @ New York City, NY”


Results

• Otherwise, no polylogarithmic search

€

α =1⇒ Ο(logn)


How to Search in HNM??

€

f (h(v,w)) ~ b−h(v,w )

€

f (h(v,w))

f (h(v,x))x≠v

∑€

h(v,w)

€

k = c log2 n


Useful Neighbor

€

v → t

v, t ∈ T

commonAncestor(v, t) = u

Height( ′ T ) = i,u∈ ′ T ,root( ′ T ) = u

Height( ′ ′ T ) = (i −1), t ∈ ′ ′ T ∧t ∉ ′ ′ T

Is “v” useful to reach “t”?

v t

€

T


Useful Neighbor

€

v → t

v, t ∈ T



Height( ′ ′ T ) = (i −1), t ∈ ′ ′ T ∧t ∉ ′ ′ T


v

u

t

€

T

€

′ T


Useful Neighbor

€

v → t

v, t ∈ T



Height( ′ ′ T ) = (i −1), t ∈ ′ ′ T ∧t ∉ ′ ′ T


v

u

t

€

T

€

′ T

€

′ ′ T

w


Useful Neighbor Recursively

€

v → t

v, t ∈ T



Height( ′ ′ T ) = (i −1), t ∈ ′ ′ T ∧t ∉ ′ ′ T


v

u

€

T

€

′ T

€

′ ′ T

w t


Search

• Find one “useful” neighbor in G as the next step

• What happens if NO useful neighbor?• Expected steps to reach “t”.


Probability to have 1 U.N.

€

Z = b−h(v,x )

x≠v

∑ = (b −1)b j−1

j=1

log n

∑ b− j ≤ logn

bi−1leaves∈ ′ ′ T

b−i

logn

bi−1 ×b−i

logn=

1

b log n

(1−1

b log n)c log2 n ≤ n−θ

One leave

All out-links


HNM

• High probability to be useful• How about “constant links”?


Group Structures

• R is a group; R’ is a strict smaller subgroup

• R1, R2,R3,… all contain v, then

• q(v,w): minimum size of a group containing both v and w

€

q = R ≥ 2,v ∈ R ⇒ (v ∈ ′ R ⊆R)∧(q = R > ′ R > λq)

€

∀i,( Ri ≤ q)∧(v ∈ Ri)⇒i

URi ≤ βq


How to Search in Group Structure??

€

f (q(v,w)) ~ q(v,w)−α

€

f (q(v,w))

f (q(v,x))x≠v

∑€

q(v,w)

€

k = c log2 n


Idea

• (v, t)• R is the minimum-sized group containing both

v and t.• With property (1)

• Then:

€

q = R ≥ 2,v ∈ R ⇒ (v ∈ ′ R ⊆R)∧(q = R > ′ R > λq)

€

∃ ′ R ⇒ (t ∈ ′ R )∧(λ2 R < ′ R < λ R )

How to define “usefulness” of v?


Usefulness of v

• (v, t)• R is the minimum-sized group containing both

v and t.• With property (1)

• Then:

€

q = R ≥ 2,v ∈ R ⇒ (v ∈ ′ R ⊆R)∧(q = R > ′ R > λq)

€

∃ ′ R ⇒ (t ∈ ′ R )∧(λ2 R < ′ R < λ R )

€

∃x,(l(v, x) =1)∧(x ∈ ′ R )



€

Z = b−h(v,x )

x≠v

∑ = (b −1)b j−1

j=1

log n

∑ b− j ≤ logn

bi−1leaves∈ ′ ′ T

b−i

logn

bi−1 ×b−i

logn=

1

b log n

(1−1

b log n)c log2 n ≤ n−θ

One leave

All out-links



€

Z =1

q(v,x)x≠v

∑ ≤ β j +1

j=1

log n

∑ β −( j−1) = β 2 logβ n

(1−λ2

β 2 logβ n)c log2 n ≤ n−θ


Results

• Otherwise, no polylogarithmic search

€

α =1⇒ Ο(logn)


Fixed Number of Out-Links

• Relax “t” to “a cluster of t”

v t

€

T

Cl Cl

€

T

tx

vw€

m = L

r = Cluster

n = m × r

r: Resolution


Question #1

• Why can’t we just treat “Cluster” as “Super Node” and we go home (by applying the HNM results)?

Cl Cl

€

T

tx

vw€

m = L

r = Cluster

n = m × r


Not necessarily

Cl Cl

tx

vw

Cl

pq


Probability

€

f (h(v,w)) ~ (h(v,w) +1)−2b−h(v,w )

Z ≤ 2r


Question #2

• For any out-link of v, what is the probability that the end point of the out-link is in the same cluster of v?


Answer

€

(0 +1)−2b−0 =1

1× r

Z≥

r

2r=

1

2


Results

• If the resolution is polylogarithmic, the the search is polylogarithmic if alpha = 1.


A “Similar” Process

v

u

€

T

€

′ T

€

′ ′ T

w t

Coloring the Links


Reading

• “Small World Phenomena and the Dynamics of Information” by J. Kleinberg, NIPS, 2001

ecs289m spring, 2008 social network models s. felix wu computer science department university of...

Documents