pagerank, centrality measures and consensus: a systems and ...assets.disi.unitn.it › ... ›...

27
1 ICT International Doctoral School, Trento @RT 2015 PageRank, Centrality Measures and Consensus: A Systems and Control Viewpoint Roberto Tempo CNR-IEIIT Consiglio Nazionale delle Ricerche Politecnico di Torino [email protected] ICT International Doctoral School, Trento @RT 2015 CNR-IEIIT PageRank Problem ICT International Doctoral School, Trento @RT 2015 CNR-IEIIT Website of Umberto Eco Author of “The Name of the Rose” translated in more than 40 languages Looking for “Umberto Eco” in Google we find the website ICT International Doctoral School, Trento @RT 2015 CNR-IEIIT PageRank is a numerical value in the interval [0,1] Using a PageRank checker we compute PageRank is Google’s view of the importance of this page” “PageRank reflects our view of the importance of Web pages by considering more than 500 million variables and 2 billion terms. Pages that are considered important receive a higher PageRank and are more likely to appear at the top of the search results” PageRank for Umberto Eco ICT International Doctoral School, Trento @RT 2015 CNR-IEIIT PageRank is used in search engines to indicate the importance of the page currently visited PageRank has broader utility than search engines Used in various areas for ranking other “objects” o scientific journals (Eigenfactor) o economic complexity o cancer biology and protein detection o ranking top sport players o PageRank ICT International Doctoral School, Trento @RT 2015 CNR-IEIIT Random Surfer Model Network consisting of servers (nodes) connected by directed communication links Web surfer moves along randomly following the hyperlink structure When arriving at a page with several outgoing links, one is chosen at random, then the random surfer moves to a new page, and so on…

Upload: others

Post on 01-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

1

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

PageRank, Centrality Measures and Consensus: A Systems and Control Viewpoint

Roberto Tempo

CNR-IEIIT

Consiglio Nazionale delle Ricerche

Politecnico di Torino

[email protected]

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

PageRank Problem

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Website of Umberto Eco

Author of

“The Name of the

Rose” translated

in more than

40 languages

Looking for

“Umberto Eco”

in Google we

find the websiteICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

PageRank is a numerical value in the interval [0,1]

Using a PageRank checker we compute

“PageRank is Google’s view of the importance of this page”

“PageRank reflects our view of the importance of Web pages byconsidering more than 500 million variables and 2 billion terms.Pages that are considered important receive a higher PageRankand are more likely to appear at the top of the search results”

PageRank for Umberto Eco

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

PageRank is used in search engines to indicate theimportance of the page currently visited

PageRank has broader utility than search engines

Used in various areas for ranking other “objects”

o scientific journals (Eigenfactor)o economic complexityo cancer biology and protein detectiono ranking top sport playerso …

PageRank

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Surfer Model

Network consisting of servers (nodes) connected bydirected communication links

Web surfer moves along randomly following thehyperlink structure

When arriving at a page with several outgoing links, oneis chosen at random, then the random surfer moves to anew page, and so on…

Page 2: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

2

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Surfer Model

Web representation with incoming and outgoing links

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Surfer Model

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Surfer Model

Pick an outgoing link at random

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Surfer Model

Arriving at a new web page

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Surfer Model

Pick another outgoing link at random

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Surfer Model

Page 3: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

3

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Surfer Model

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Surfer Model

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Surfer Model

If a page is important then it is visited more often...

The time the random surfer spends on a page is ameasure of its importance

If important pages point to your page, then your pagebecomes important because it is visited often

What is the probability that your page is visited?

Need to rank the pages in order of importance forfacilitating the web search

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Graph Representation

1 2

34

5

Directed graph with nodes (pages) and links representing the web

Graph is not necessarily strongly connected (from 5 you cannot reach other pages)

Graph is constructed using crawlers and spiders moving continuously along the web

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

For each node we count the number of outgoing links and normalize their sum to 1

Hyperlink matrix is a nonnegative (column) substochastic matrix

Hyperlink Matrix

1 2

3 4

5

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Hyperlink Matrix

1 2

3 4

5

0 1 0 0 0

1 / 3 0 0 1 / 2 0

1 / 3 0 0 1 / 2 0

1 / 3 0 0 0 0

0 0 1 0 0

A

Page 4: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

4

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Page 5 is a Dangling Node

1 2

34

5

0 1 0 0 0

1 / 3 0 0 1 / 2 0

1 / 3 0 0 1 / 2 0

1 / 3 0 0 0 0

0 0 1 0 0

A

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Dangling nodes are pageshaving no outgoing link

Problem not well-posed

Probability of page 5increases and increases…probability of the otherpages decreases anddecreases…

Eventually we obtain p5 = 1and p1 = p2 = p3 = p4 = 0

Dangling Nodes

1 2

34

5

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Dangling Nodes

Example: pdf file with no

hyperlink

Benchmark: Web Lincoln

University, New Zealand

3756 nodes

31718 total #outgoing links

H. Ishii, R. Tempo (2014)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Dangling Nodes

3255 dangling nodes (85%)

Red dots outgoing

links toward dangling

nodes

Blue dots are normal

links

White area corresponds

to no-links

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Dangling Nodes

Random surfer gets stuck when visiting a pdf file

In this case the “back button” of the browser is used

Easy fix: Add new links to make the matrix stochastic

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Easy Fix: Add New Link

1 2

3 4

5

We add a new outgoinglink from page 5 to page 3

0 1 0 0 0

1/ 3 0 0 1/ 2 0

1/ 3 0 0 1/ 2

1/ 3 0 0 0

0

1

0

0 0 1 0

A

In the benchmark this fix increases the #links from to 31718 to 40646

Page 5: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

5

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Assumption: No Dangling Nodes

This is a web modeling problem

Assume that there are no dangling nodes or add artificiallinks

Hyperlink matrix A is a nonnegative stochastic matrix(instead of substochastic)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Random Surfer Model and Markov Chains

Random surfer model is represented as a Markov chain

where x(k) is a probability vector

x(k) [0,1]n and ∑i xi(k) = 1

xi(k) represents the importance of the page i at time k

( 1) ( )x k Ax k

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Convergence of the Markov Chain

Question: Does the Markov chain converge to astationary value

x(k) x* for k

representing the probability that the pages are visited?

Answer: No

Example:0 1 0 0

0 0 1 0

0 0 0 1

1 0 0 0

A

0

1(0)

0

0

x

x1(k)

k

1 -

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Teleportation Model

Recall that the matrix A is a nonnegative stochasticmatrix

We introduce a different model

Teleportation: After a while the random surfer gets boredand decides to “jump” to another page not directlyconnected to that currently visited

New page may be geographically or content-basedlocated far away

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Recall the Random Surfer Model

Web representation with incoming and outgoing links

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Recall the Random Surfer Model

Page 6: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

6

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Teleportation Model

We are “teleported” to a web page located far away

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Surfer Model Again

Pick another outgoing link at random

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Surfer Model Again

Pick another outgoing link at random

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Teleportation Model Again

We are teleported to another web page located far away

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Convex Combination of Matrices

Teleportation model is represented as a convexcombination of matrices A and S/n

S = 1 1T is a rank-one matrix

1 vector with all entries equal to one

Consider a matrix M defined as

M = (1 - m) A + m/n S m (0,1)

where n is the number of pages

The value m = 0.15 is used at Google

1 1

1 1

S

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Matrix M

M is a convex combination of two nonnegativestochastic matrices and m (0,1)

M is a strictly positive stochastic matrix

Page 7: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

7

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Convergence of the Markov Chain

Consider the Markov chain

x(k+1) = M x(k)

where M is a strictly positive stochastic matrix

If ∑i xi(0) = 1 convergence is guaranteed by PerronTheorem

x(k) x* for k

x* = M x* = [(1 - m) A + m/n S] x* m (0,1)

Corresponding graph is strongly connected

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

PageRank: Bringing Order to the Web

Rank n web pages in order of importance

Ranking is provided by x*

PageRank x* of the hyperlink matrix M is defined as

x*=M x* where x* [0,1]n and ∑i xi* = 1

S. Brin, L. Page (1998)S. Brin, L. Page, R. Motwani, T. Winograd (1999)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

PageRank: Bringing Order to the Web

Rank n web pages in order of importance

Ranking is provided by x*

PageRank x* of the hyperlink matrix M is defined as

x*=M x* where x* [0,1]n and ∑i xi* = 1

x* is the stationary distribution of the Markov Chain(steady-state probability that pages are visited is x* )

x* is a nonnegative unit eigenvector corresponding to theeigenvalue 1 of M

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Positive Stochastic Matrices and Perron Theorem

The eigenvalue 1 of M is a simple eigenvalue and anyother eigenvalue (possibly complex) is strictly smallerthan 1 in absolute value

There exists an eigenvector x* of M with eigenvalue 1such that all components of x* are positive

M is irreducible and the corresponding graph is stronglyconnected (for any two vertices i,j there is a sequence ofedges which connects i to j)

Stationary probability vector x* is the eigenvector of M

x* exists and it is unique

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Two vertices i, j V of an undirected graph G V ,E)are connected if there is a path between i and j

A graph is connected if every pair of vertices in thegraph is connected

A directed graph is a graph where the edges have adirection associated with them

Oriented graphs are directed graphs with no self-loops,no multiple adjacencies and no 2-cycles

Undirected, Directed, Oriented Graphs

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

A directed graph is called weakly connected if replacingall its directed edges with undirected edges produces aconnected (undirected) graph

A graph is said to be strongly connected if every vertexis reachable from every other vertex

Weakly and Strongly Connected Graphs

Page 8: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

8

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

PageRank Computation

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

PageRank Computation withPower Method

PageRank is computed with the power method

x(k+1) = M x(k)

If ∑i xi(0) = 1 convergence is guaranteed because M is astrictly positive stochastic matrix (Perron Theorem)

x(k) x* for k

PageRank computation requires 50-100 iterations (40 inthe benchmark)

This computation takes about a week and it is performedcentrally at Google once a month

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Why m = 0.15?

Asymptotic rate of convergence of power method isexponential and depends on the ratio

We have

1(M) = 1 | 2(M) | ≤ 1 - m = 0.85

For Nit= 50 we have 0.8550≈ 2.95 10-4

For Nit= 100 we have 0.85100≈ 8.74 10-8

Larger m implies faster convergence, but numericallyunstable

2

1

| λ |

| λ | 1

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

PageRank Computation with Power Method

1 4

23

0.038 0.037 0.037 0.321

0.887 0.037 0.462 0.321

0.037 0.462 0.037 0.321

0.037 0.462 0.462 0.037

M

0 0 0 1/ 3

1 0 1/ 2 1/ 3

0 1/ 2 0 1/ 3

0 1/ 2 1/ 2 0

A

T* 0.12 0.33 0.26 0.29x

0.15m

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Size of the Web

The size of M is more than 8 billion (and it isincreasing)!

Sparsity in the web: 1020 entries

1012 non-zero entries

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Why m = 0.15?Answer 2: Sensitivity

Study sensitivity wrt m taking

We obtain for i = 1, 2,…, n

A deeper analysis shows that if 2(M) is close to 1(M)then x*(m) is very sensitive for small m

Conclusion: m=0.15 is a good compromise

*d( )

d ix mm

*d 1( ) 6.66

d ix mm m

Page 9: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

9

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Columbia River, The Dalles, Oregon

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Distributed Viewpoint

More and more computing power is needed…

… the distributed viewpoint because global informationabout the network is difficult to obtain

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Randomized Decentralized Algorithms

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Main idea: Develop Las Vegas randomized decentralized

algorithms for computing PageRank

Randomized Decentralized Algorithms for PageRank Computation

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Monte Carlo and Las Vegas Randomized Algorithms

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Monte Carlo and Las Vegas

Monte Carlo was invented by Metropolis, Ulam, vonNeumann, Fermi, … in the fourties (Manhattan project)

Metropolis Fermi Ulam, Feymann, von Neumann

Las Vegas first appeared in computer science in the lateseventies

Page 10: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

10

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Randomized Algorithm: Definition

Randomized Algorithm (RA): An algorithm that makesrandom choices during its execution to produce a result

Example: Matlab code

set_r =1:0.01:3;for k =1:length(set_r)

if (rand > 0.5) then a_opt(k) = hel(k);else a_opt(k) = 3.7;end if

a_lin(k) =(e/(e-1))*r;a_sub(k) =(a/(a-1))*(r+log(a)-1);

end

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Las Vegas Randomized Algorithm

Las Vegas Randomized Algorithm (LVRA): Arandomized algorithm that always produces correctresults, the only variation from one run to another is therunning time

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Example of Las Vegas: Discrete Random Variables

q1 q2 q3 q4 q5 q6 q7 q8 q9 q10

Consider discrete random variables

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Example of Las Vegas: Discrete Random Variables

q1 q2 q3 q4 q5 q6 q7 q8 q9 q10

Consider discrete random variables

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Example of Las Vegas: Discrete Random Variables

q1 q2 q3 q4 q5 q6 q7 q8 q9 q10

Consider discrete random variables

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Las Vegas Randomized Algorithms

Las Vegas Randomized Algorithm (LVRA): Alwaysgive the correct solution

They are also called zero-sided randomized algorithms

The solution obtained with a LVRA is probabilistic, so“always” means with probability one

Running time may be different from one run to another

We study the average running time

Page 11: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

11

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Las Vegas Viewpoint

Consider discrete random variables

The sample space is discrete and MN possible choicescan be made

In the binary case we have 2N

Finding maximum requires ordering the 2N choices

Las Vegas can be used for ordering real numbers

Example: RQS

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Decentralized Communication Protocol - Randomization

1 4

23

Gossip communication protocol:

1. at time k randomly selectpage i (i=4) for PageRankupdate

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Decentralized Communication Protocol – Outgoing Links

1 4

23

Gossip communication protocol:

1. at time k randomly selectpage i (i=4) for PageRankupdate

2. send PageRank value ofpage i to the outgoing pagesthat are linked to page i

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Decentralized Communication Protocol – Incoming Links

1 4

23

Gossip communication protocol:at time k randomly select page i(i=4) for PageRank update

3. request PageRank values fromincoming pages that arelinked to page i

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Global Clock

The pages taking action are determined via a stochasticprocess θ(k) {1, …, n}

θ(k) is assumed to be i.i.d. with uniform probability

Prob{θ(k)=i} = 1/n

If at time k θ(k) = i then page i initiates PageRank update

θ(k) is a global clock known to all the pages

Clock synchronization problem

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Randomized Update Scheme

Consider the randomized update scheme

x(k+1) = Aθ(k) x(k)

where Aθ(k) are the decentralized link matrices related tothe matrix A

This is a Markovian jump system

Page 12: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

12

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Decentralized Link Matrices - 1

0 0 0 1/ 3

1 0 1/ 2 1/ 3

0 1/ 2 0 1/ 3

0 1/ 2 1/ 2 0

A

4

1/ 3

1/ 3

1/ 3

0

A

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Decentralized Link Matrices - 2

0 0 0 1/ 3

1 0 1/ 2 1/ 3

0 1/ 2 0 1/ 3

0 1/ 2 1/ 2 0

A

4

0 0 1/ 3

0 0 1/ 3

0 0 1/ 3

0 0 0 0

A

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Decentralized Link Matrices - 3

0 0 0 1/ 3

1 0 1/ 2 1/ 3

0 1/ 2 0 1/ 3

0 1/ 2 1/ 2 0

A

4

1 0 0 1/ 3

0 1 0 1/ 3

0 0 1 1/ 3

0 0 0 0

A

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Decentralized Link Matrices - 4

0 0 0 1/ 3

1 0 1/ 2 1/ 3

0 1/ 2 0 1/ 3

0 1/ 2 1/ 2 0

A

4

1 0 0 1/ 3

0 1 0 1/ 3

0 0 1 1/ 3

0 0 0 0

A

3

1 0 0 0

0 1 1/ 2 0

0 0 0 0

0 0 1/ 2 1

A

Matrices Ai are very sparse

n -1 diagonal identity entries

n non-diagonal entries

Linear # entries

Algorithm easily implementable

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Properties of the Algorithm: Average Matrix Aave

Average matrix Aave = E[ Aθ(k) ]

Taking uniform distribution in the stochastic process θ(k)we have

Aave = 1/n ∑i Ai

Lemma (properties of the average matrix Aave )

(i) Aave = 2/n A + (1-2/n) I

(ii) Matrices A and Aave have the same eigenvectorcorresponding to the eigenvalue 1

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Modified Randomized Update Scheme

Remark: Need to work with the teleportation model(positive stochastic matrix M)

Consider the modified randomized update scheme

x(k+1) = Mθ(k) x(k)

where Mθ(k) are the modified decentralized link matricescomputed as

Mi = (1- r) Ai + r/n S i = 1, 2, …, n

and r (0,1) is a design parameter

r = 2m/(n – mn + 2m)

Page 13: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

13

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Properties of the Algorithm:Average Matrix Mave

Average matrix Mave = E[ Mθ(k) ]

Define r = 2m/(n – mn + 2m)

Lemma (properties of the average matrix Mave)

(i) r (0,1) and r < m =0.15

(ii) Mave = r/m M + (1-r/m) I

(iii) Eigenvalue 1 for Mave is the unique eigenvalue ofmaximum modulus and PageRank is the correspondingeigenvector

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Undesired Oscillations

The state x(k) oscillates and there is no convergence to afixed point…

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Time-Averaging

Time averaging was introduced in the seventies toaccelerate convergence of stochastic approximationalgorithms

With time-averaging we remove oscillations

0

1( ) ( )

1

k

i

y k x ik

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Convergence Properties

Theorem (convergence properties)

The time average y(k) of the modified randomizedupdate scheme converges to PageRank x* in MSE

E[ ||y(k) - x*||2 ] 0 for k

provided that ∑i xi(0) = 1

Proof: Based on the theory of ergodic matrices

H. Ishii, R. Tempo (2010)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Comments

Time average y(k) can be computed recursively as afunction of y(k-1)

Because of averaging convergence rate is 1/k

Sparsity of the matrix A is preserved because

x(k+1) = M x(k) = (1- r) A x(k) + r/n 1

where 1 is a vector with all entries equal to one0 0 0 1 / 3

1 0 1 / 2 1 / 3

0 1 / 2 0 1 / 3

0 1 / 2 1 / 2 0

A

0.038 0.037 0.037 0.321

0.887 0.037 0.462 0.321

0.037 0.462 0.037 0.321

0.037 0.462 0.462 0.037

M =

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Extensions - 1

Extensions to simultaneous

update of multiple web pages

with Bernoulli process

PageRank computation

with lossy communication

Other convergence properties of the algorithm hold

(almost-sure)

H. Ishii, R. Tempo and E. W. Bai (2012)W.-X. Zhao, H.-F. Chen and H. Fang (2013)

Page 14: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

14

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Simulation Results for the Benchmark

There are two clusters of pages with dense link structurescorresponding to higher PageRank

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Ranking Control Journals

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT ISI Web of Knowledge

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Ranking Journals: Impact Factor

Impact Factor IF

Census period (2013) of one year and a window period (2011-2012) of two years

Remark: Impact Factor is a “flat criterion” (it does not take into account where the citations come from)

2013 IF2011- 2012

2011- 2012

number citations in 2013 of articles published in

number of articles published in

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT ISI Web of Knowledge

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Ranking Journals: Eigenfactor

Eigenfactor EF

Ranking journals using ideas from PageRankcomputation in Google

In Eigenfactor journals are considered influential if theyare cited often by other influential journals

What is the probability that a journal is cited?

C. T. Bergstrom (2007)

Page 15: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

15

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Random Reader Model

“Imagine that a researcher is to spend all eternity in the libraryrandomly following citations within scientific periodicals. Theresearcher begins by picking a random journal in the library.From this volume a random citation is selected. The researcherthen walks over to the journal referenced by this citation. Theprobability of doing so corresponds to the fraction of citationslinking the referenced journal to the referring one, conditioned onthe fact that the researcher starts from the referring journal. Fromthis new volume the researcher now selects another randomcitation and proceeds to that journal. This process is repeated adinfinitum.”

A. D. West, T. C. Bergstrom, C. T. Bergstrom (2010)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Random Reader Model: Graph Representation

Directed graph with journals (nodes) and citations (links)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Random Reader Model:Graph Representation

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Random Reader Model:Graph Representation

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Random Reader Model:Graph Representation

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Random Reader Model:Graph Representation

Directed graph with journals (nodes) and citations (links)

Page 16: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

16

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Self-citations are omitted

Normalization to obtain a column substochastic matrix

(note: if 0/0 we set aij= 0)

0

0

0

A

Cross-citation Matrix - 1

i j

i jk jk

aa

a

2013

2008 - 2012

a number citations in from journal j to ijarticles published in journal i in

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

0

0

0

0

A

Black Holes

1 2

34

5

Black holes: journals that do not cite any other journalColumns with entries equal to zeroA substochastic matrix

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Article vector v

vi is the fraction of all published articles coming fromjournal i during the window period 2008-2012

Article vector v is a stochastic vector

Total number of journals n = 8400

Article Vector

2008 - 2012

2008 - 2012i

number articles published by journal i in v

number articles published by all journals in

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

* * * * 0

* * * * 0

* * * * 0

* * * * 0

* * * * 0

A

Cross-citation Matrix – 2

1 2

34

Replace A with stochastic matrix introducing the article vector v

A

5

1v2v

3v 4v

5v

1

2

3

4

5

* * * *

* * * *

* * * *

* * * *

* * * *

v

v

A v

v

v

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Replace substochastic matrix A with stochastic matrix

introducing the article vector v

1

2

3

4

5

* * * *

* * * *

* * * *

* * * *

* * * *

v

v

A v

v

v

* * * * 0

* * * * 0

* * * * 0

* * * * 0

* * * * 0

A

Cross-citation Matrix - 2

A

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Random Reader Model and Markov Chains

Rank n = 8336 journals in order of importance

Random reader model represented as a Markov chain

where x(k) [0,1]n and ∑i xi(k) = 1 is the journalinfluence vector at step k

xi(k) represents the importance of the journal i

Question: Does the Markov chain converge to astationary value representing the probability of citation?

( 1) ( )x k Ax k

Page 17: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

17

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Convergence of the Markov Chain - 1

Answer: No (recall previous example)

Need to replace with the matrix M obtaining theEigenfactor equation

where is a weighting matrix

T(1 - ) 0.15M m A m v m 1

A

1 1T

n n

v v

v

v v

1

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Teleportation Model

This avoids being trapped in a small set of journals

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Convergence of the Markov Chain - 2

Consider the Markov chain

x(k+1) = M x(k)

If ∑i xi(0) = 1 convergence is guaranteed by PerronTheorem because M is a positive stochastic matrix

x(k) x* for k

The corresponding graph is strongly connected

We have x* = M x*

Journal influence vector x* exists and it is unique

Eigenvector corresponding to the simple eigenvalue 1ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

0

0

0

0

A

0

1

0

0

b

To preserve sparsity of A we write the journal influencevector iteration using power method

where b is the black holes vector having 1 for entriescorresponding to black holes and 0 elsewhere

PageRank Iteration

T( 1) (1- ) ( ) [(1- ) ( ) ]x k m Ax k m b x k +m v

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Eigenfactor EF

Journal influence vector x* is the steady-state fraction oftime spent reading each journal

x* provides weights to the citation values

Eigenfactor EFj of journal j is the

percentage of the total weighted

citations that journal j receives

from all 8336 journals

Article influence score AIj

100 ( *)

( *) j

ji

i

Ax

Ax

EF

0.01

j

jjv

EF

AI

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

2013 Impact Factor 2013 EigenfactorTM

3.4 IEEE CSM 1 Automatica 0.052

3.2 IEEE TAC 2 IEEE TAC 0.051

3.1 Automatica 3 SIAM J Contr & Opt 0.016

2.6 Int J Rob Nonlin Contr 4 Syst & Contr Lett 0.013

2.5 IEEE TCST 5 IEEE TCST 0.013

2.2 J Proc Contr 6 Int J Contr 0.009

1.9 Contr Eng Pract 7 Int J Rob Nonlin Contr 0.009

1.9 Syst & Contr Lett 8 J Proc Contr 0.008

1.4 SIAM J Contr & Opt 9 Contr Eng Pract 0.008

1.2 Math Contr Sign Sys 10 IEEE CSM 0.004

1.1 Int J Contr 11 Europ J Contr 0.002

0.8 Europ J Contr 12 Math Contr Sig Sys 0.001

Page 18: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

18

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Distributed Average Consensus

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Distributed Average Consensus

General problem: Consider a set of N agents each

having a numerical value

This numerical value is communicated to the

neighboring agents iteratively

Consensus: All agents eventually reach a common

value (average of the initial value)

Applications: Sensor networks, load balancing, multi-

vehicle coordination, UAVs, …

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Graph Formulation

Consider a network of N nodes and the graph (V, E)

E is the set of edges V is the set of nodes

graph is undirected

and connected

(no directions, no

self-loops, no

multiple edges)

at time k each node i has a scalar value xi(k); initial value is xi(0)

x5(k)

x1(k)

x4(k)

x3(k)

x2(k)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Distributed Average Consensus

Objective: Derive a randomized algorithm (MC or LV)

such that

1. Nodes update the values xi(k) using neighbor

information

2. The values of the nodes converge to the average of

initial values xi(0)

Three different cases depending on the range of node

values: Real, integer (quantized) and binary values

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Real/Quantized/Binary Agent Values

Real and quantized case are very popular[1]

Monte Carlo (MCMC Markov Chain Monte Carlo)

algorithms are developed in these cases

The binary value case is not really studied in the

systems and control community…

Paradigm: Byzantine Agreement Problem

[1] P.J. Antsaklis, J. Baillieul (2007)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Byzantine Agreement Problem (ByzAgr)

Page 19: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

19

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Byzantium, 1453 AD

Some history: In the year 1453 AD, the city of

Byzantium is under siege… powerful Ottoman

battalions are camped around the city on both sides of

the Bosporus, poised to launch the final attack…

The generals, located in several camps, are trying to

reach an agreement… they can communicate thanks to

the messenger service of the Ottoman Army…

Messages can be delivered certifying the identity of

the sender and preserving its content…

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Byzantium and the Bosporus

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Loyal and Traitor Generals

Loyal generals are trying to reach an agreement on

when they should launch the final attack, but…

Traitor generals are secretly conspiring… their aim is

to confuse the loyal generals so that an insufficient

number of generals is deceived into attacking…

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Byzantine Agreement Problem (ByzAgr)[1]

Find a distributed consensus algorithm satisfying thefollowing conditions:

1. Non-triviality: If all generals have the same input, allloyal generals take a decision equal to the input

2. Agreement: All loyal generals should agree on thedecision

3. Limited dithering: Eventually all loyal generals shouldcome to a decision

[1] L. Lamport, R. Shostak, M. Pease (1982)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Agents and Communication Mechanism

Synchronous: The agents have synchronized clocks

Reliable: A message is guaranteed to be delivered

Authenticated: Identity of the sender is known

Faulty agents: Hard to identify because they behave

maliciously instead of simply crashing; they may

collude to maximize damage

Point-to-point communication: Underlying topology is

that of a undirected and connected graph

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Binary Consensus

Consider N binary agents xi(k) {0, 1} for all i, k

There are Nf < N faulty agents

Binary consensus is achieved if each agent determines

a binary decision value yi {0, 1} such that:

1. All non-faulty agents arrive at the same decision value

2. If all agents have the same initial value xi(0), then the

non-faulty agents reach a common decision yi = xi(0)

Page 20: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

20

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Distributed Agreement Protocol

Distributed agreement protocol proceeds in a sequence

of rounds

At each round each agent sends a binary message

(called vote, e.g. attack or retrieve) to the other agents

Non-faulty agents send the same vote to all the other

agents; faulty agents may send different votes to

different agents

Round is concluded when all agents receive a vote

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Randomized Algorithm for ByzAgr

First randomized algorithm is due to Rabin[1]

There is a global coin toss performed by a trusted

party consisting of

heads or tails

The result of the global coin toss is correctly

transmitted to all agents

All agents equally contribute to the decision

Majority vote: Count the number of votes received[1] M.O. Rabin (1983)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Rabin’s Randomized Algorithm

Nf < N/8; N is a multiple of 8; L = (5N/8) + 1; H = (3N/4) + 1; G = 7N/8

input: vote, output: decision yi, for each round do

1. broadcast vote, receive votes from all other agents

2. set mi(k) ← majority value (0, 1) received

3. set ti(k) ← number of occurrences of mi(k)

4. if coin = heads, then set t(k) ← L, else t(k) ← H

5. if ti(k) t(k), then set vote ← mi(k), else vote ← 0

6. if ti(k) G, then set yi ← mi(k)

¯

¯ ¯

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Main Theorem for ByzAgr

Main Theorem[1]

- Binary consensus is achieved with probability one for

each initial condition

- Expected number of steps is a constant

[1] M.O. Rabin (1983)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT ByzAgr and LVRA

Rabin’s algorithm always (probabilistically) gives a

correct output

Number of rounds is a random variable because the

algorithm is based on randomization

The expected number of steps to reach consensus is a

constant

This algorithm is a LVRA

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Randomized vs Deterministic Algorithms for ByzAgr

Deterministic algorithms requires Nf steps

Impossibility Theorem[1]

If there is no synchronized clock for all agents and if

the processing speed of the agents is different, then no

deterministic algorithm can achieve consensus (even in

the presence of a single faulty agent)

[1] M.J. Fisher, N.A. Lynch, M.S. Paterson (1985)

Page 21: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

21

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Consensus and PageRank

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Graph of Agents

Consider a graph of agents which communicate using adecentralized communication pattern (similar to that forPageRank)

The value of agent i at time k is xi

Values of agents may be updated using a LVRA

x(k+1) = Aθ(k) x(k)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Decentralized Communication Pattern

Stochastic process θ(k) {1, …, d} where d is thenumber of communication patterns

θ(k) is assumed to be i.i.d. with uniform probability

Prob{θ(k)=i} = 1/d

If at time k θ(k) = i then agent i initiates update

Technical differences with communication protocol forPageRank but the ideas are similar

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Communication Pattern for Agent 1

1 4

2

1

* 0 0 *

* * 0 0

0 0 * 0

0 0 0 *

A

3

A1 is row stochastic1

1/ 2 0 0 1/ 2

1/ 2 1/ 2 0 0

0 0 1 0

0 0 0 1

A

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Consensus

We say that consensus w.p.1 is achieved if for any initialcondition x(0) we have

Prob

for all agents i, j lim ( ) ( ) 0 1i j

kx k x k

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Convergence Properties

Lemma (convergence properties)

Assuming that the graph is strongly connected, thescheme

x(k+1) = Aθ(k) x(k)

achieves consensus w.p.1

Prob

for all agents i, j and for any initial condition x(0)

Y. Hatano, M. Mesbahi (2005); C. W. Wu (2006)A. Tahbaz-Salehi, A. Jadbabaie (2008)

lim ( ) ( ) 0 1i jk

x k x k

Page 22: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

22

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT PageRank and Consensus

Consensus PageRank

all agent values become equal page values converge to constant

strongly connected agent graph web is not strongly connected

Ai are row stochastic matrices Ai and Mi are column stochastic

self-loops in the agent graph no self-loops in the web

convergence for any x(0) x(0) stochastic vector

time averaging y(k) not necessary averaging y(k) required

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Economic Complexity

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Economic Complexity

Introduced in the paper

Hidalgo C. and Hausmann R. “The Building Blocks ofEconomic Complexity,” Proceedings of the NationalAcademy of Science, 106, 2009

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Bipartite Graphs - 1

Bipartite graph representing countries and products

countries v productsxv={v1,v2,v3,v4,v5} xu={u1,u2,u3,u4}

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Bipartite Graphs - 2

Bipartite graph representing countries and products

countries v products

xv={v1,v2,v3,v4,v5} xu={u1,u2,u3,u4}

Construct a binary cross-product matrix M

Compute PageRank xv* and xu* representing the fitnessof countries and the quality of products

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Economic Complexity

Tripartite graph representing countries, capabilities andproducts

theory of

hidden

capabilities

Study convergence of algorithms for PageRank

xv(k+1) = xv(k) and xu(k+1) = xu(k)

Page 23: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

23

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Distributed Algorithms for Web Aggregation

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Aggregation, PageRank and Complexity

Aggregation is a tool to break complexity

Applications:

o webo power grido biologyo social networkso journalso e-commerce

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT PageRank and Web Aggregation

View the web as a network of agents connected via links

Web aggregation

Approximate computation

of PageRank to reduce

computation cost and

communication at servers

Guaranteed bounds on the

approximation error

H. Ishii, R. Tempo, E.-W. Bai (2012)ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Graph Representation

1 4

2

6

3

5

0 1/2 0 0 0 0

1/ 2 0 1/3 0 0 0

0 1/2 0 1/3 0 0

1/ 2 0 1/3 0 0 1/2

0 0 0 1/3 0 1/2

0 0 1/3 1/3 1 0

A

T* 0.0614 0.0857 0.122 0.214 0.214 0.302x

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Web Aggregation

1 4

2

6

3

5

0 1/2 0 0 0 0

1/ 2 0 1/3 0 0 0

0 1/2 0 1/3 0 0

1/ 2 0 1/3 0 0 1/2

0 0 0 1/3 0 1/2

0 0 1/3 1/3 1 0

A

T* 0.0614 0.0857 0.122 0.214 0.214 0.302x U1

U2

U3

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Aggregation of Agents

1 4

2

6

3

5

U1

U2

U3

U3

U2

U1

Page 24: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

24

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Web Aggregation

Most links are intra‐host ones

Two-step approach:

1. Global step: Compute total PageRank for each group

2. Local step: Distribute value among group members

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Web Aggregation

Singular perturbation

methods for PageRank

computation

Web pages having many external outlinks are consideredas single groups

number external outlinksδ

number outlinks

J.H. Chow, P.V. Kokotovic (1985)H. Ishii, R. Tempo, E.W. Bai (2012)

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Coordinate Change for PageRank

where is the aggregated PageRank (based on groupvalue), V1 and V2 are block diagonal matrices (afterreordering)

Objective: design a distributed randomized algorithm forcomputing approximately PageRank x* based on thecomputation of

* *x Vx*

1* *1*

22

Vxx x

Vx

*1x

*1x

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

PageRank equation

x* = (1- m) A x* + m/n 1

Internal: block diag stochastic

External: block diag norm bounded by 2

internalcommunications in the same group

external communications

inter-groups

PageRank Equation: Internal/External Communications

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

11

21 2

* *1 12

2

1* *2 2

(1 )0

A

A A

ux xA mm

nx x

New PageRank equation

where u = V1 1

is approximate PageRank

Approximate PageRank Equation

A A1 111A AI2 221 22

0(1 )

0

ux xA mm

nx xA A

Ax

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

A11 1

A1 (( 1) (1 ))

0A x

umx k m

nk

1. Group values: reduced order computation

2. Iteration

communications inter-groups

communications inter-groups

Aggregated PageRank Algorithm - 1

A A 1A A22 12 22 1( 1) ( ) ((1 )[ 1 )( ) ]x k x k m I- A A x km

A22A

A11 1 ( )A x k

A21 1 ( )A x k

communications

in the same group

Page 25: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

25

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

3. Inverse state transformation

Theorem (convergence properties):

xA(k) xA for k in MSE sense provided that

xA (0) is a stochastic vector

1A A( ) ( )Vx k x k

Aggregated PageRank Algorithm - 2

1V

block diagonal matrix

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Approximation Error

Approximate PageRank

Theorem (approximation error):

Given ε > 0, take

then

A A1 111A AI2 221 22

0(1 )

0

ux xA mm

nx xA A

A 1 Ax V x

number external outlinks 4δ

number outlinks 4(1 )(1 )m

* A1|| - ||x x

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

Centrality Measures

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

How central are specific nodes or edges in a network?

PageRank and Network Centrality Measures

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Other Centrality Measures

Degree

Closeness

Betweenness

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Degree Centrality

Degree centrality: for each node count the number ofincoming links

Page 26: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

26

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Closeness Centrality

Closeness centrality: a node is more central if it is closerto most of the other nodes

Defined as the “distance” from all the other nodes

2 => 1 dist = 1

3 => 1 dist = 2

4 => 1 dist = 3

5 => 1 dist = 5

6 => 1 dist = 4

total = 15ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Closeness Centrality

Closeness centrality: a node is more central if it is closerto most of the other nodes

Defined as the “distance” from all the other nodes

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Betweenness Centrality

Betweenness centrality ranks a node higher if it belongsto many paths between other nodes

1

# 1, 1B

# , 1

shortest paths i j passing through i j

shortest paths i j i j

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Betweenness Centrality

2 => 3 0 4 => 5 0 4 => 6 0 5 => 6 0

2 => 4 1/2 total = 1/2 + 1/3 = 5/6

2 => 5 1/3

2 => 6 0

3 => 4 0

3 => 5 0

3 => 6 0

1

# 1, 1B

# , 1

shortest paths i j passing through i j

shortest paths i j i j

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Betweenness Centrality

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Comparison of Centrality Measures

Page 27: PageRank, Centrality Measures and Consensus: A Systems and ...assets.disi.unitn.it › ... › documents › courses › 30022 › EECI_trento-pa… · PageRank is a numerical value

27

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

On-Going Work: Distributed Randomized Algorithms

Design distributed randomized algorithms for

centrality measures

Degree and closeness are “easy”

Betweenness computation is very hard for general

graphs

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

On-Going Work: Temporal Networks

Fixed number of nodes

Time-varying links

Need a new definition of PageRank

Design distributed randomized algorithms

Establish convergence properties

Goal: Handle web spam effectively

Avoid deliberate manipulation of search engines

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

References

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Key References

H. Ishii and R. Tempo, “The PageRank Problem, Multi-Agent Consensus and Web Aggregation,,” IEEE CSM,2014

R. Tempo, G. Calafiore, F. Dabbene, “RandomizedAlgorithms for Analysis and Control of UncertainSystems, with Applications,” Second Edition,Springer-Verlag, London, 2013

… more references on my website…

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT PageRank (Other References)

O. Fercoq, M. Akian, M. Bouhtou, S. Gaubert, “PageRank,Markov Chains and Ergodic Control,” IEEE TAC, 2013

W.-X. Zhao, H.-F. Chen, H.T. Fang, “Almost-Sure Convergenceof the Randomized Scheme,” IEEE TAC, 2013

A.V. Nazin and B.T. Polyak. “Randomized Algorithm toDetermine the Eigenvector of a Stochastic Matrix,” Automationand Remote Control, 2011

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Eigenfactor References

J.D. West, T.C. Bergstrom, C.T. Bergstrom, “The EigenfactorMetrics: A Network Approach to Assessing Scholarly Journals,”College of Research Libraries, 2010

C.T. Bergstrom, “Eigenfactor: Measuring the Value and Prestigeof Scholarly Journals,” C&RL News, 2007

L. Leydesdorff, “How are New Citation-Based Journal IndicatorsAdding to the Bibliometric Toolbox?,” Journal of the AmericanSociety for Information Science and Technology, 2008