pagerank, centrality measures and consensus: a systems and ...assets.disi.unitn.it › ... ›...

1

ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

PageRank, Centrality Measures and Consensus: A Systems and Control Viewpoint

Roberto Tempo

CNR-IEIIT

Consiglio Nazionale delle Ricerche

Politecnico di Torino

[email protected]


CNR-IEIIT

PageRank Problem


CNR-IEIIT Website of Umberto Eco

Author of

“The Name of the

Rose” translated

in more than

40 languages

Looking for

“Umberto Eco”

in Google we

find the websiteICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

PageRank is a numerical value in the interval [0,1]

Using a PageRank checker we compute

“PageRank is Google’s view of the importance of this page”

“PageRank reflects our view of the importance of Web pages byconsidering more than 500 million variables and 2 billion terms.Pages that are considered important receive a higher PageRankand are more likely to appear at the top of the search results”

PageRank for Umberto Eco


CNR-IEIIT

PageRank is used in search engines to indicate theimportance of the page currently visited

PageRank has broader utility than search engines

Used in various areas for ranking other “objects”

o scientific journals (Eigenfactor)o economic complexityo cancer biology and protein detectiono ranking top sport playerso …

PageRank


CNR-IEIIT Random Surfer Model

Network consisting of servers (nodes) connected bydirected communication links

Web surfer moves along randomly following thehyperlink structure

When arriving at a page with several outgoing links, oneis chosen at random, then the random surfer moves to anew page, and so on…

2



Web representation with incoming and outgoing links





Pick an outgoing link at random



Arriving at a new web page



Pick another outgoing link at random



3







If a page is important then it is visited more often...

The time the random surfer spends on a page is ameasure of its importance

If important pages point to your page, then your pagebecomes important because it is visited often

What is the probability that your page is visited?

Need to rank the pages in order of importance forfacilitating the web search


CNR-IEIIT Graph Representation

1 2

34

5

Directed graph with nodes (pages) and links representing the web

Graph is not necessarily strongly connected (from 5 you cannot reach other pages)

Graph is constructed using crawlers and spiders moving continuously along the web


CNR-IEIIT

For each node we count the number of outgoing links and normalize their sum to 1

Hyperlink matrix is a nonnegative (column) substochastic matrix

Hyperlink Matrix

1 2

3 4

5


CNR-IEIIT Hyperlink Matrix

1 2

3 4

5

0 1 0 0 0

1 / 3 0 0 1 / 2 0

1 / 3 0 0 1 / 2 0

1 / 3 0 0 0 0

0 0 1 0 0

A

4


CNR-IEIIT Page 5 is a Dangling Node

1 2

34

5

0 1 0 0 0

1 / 3 0 0 1 / 2 0

1 / 3 0 0 1 / 2 0

1 / 3 0 0 0 0

0 0 1 0 0

A


CNR-IEIIT

Dangling nodes are pageshaving no outgoing link

Problem not well-posed

Probability of page 5increases and increases…probability of the otherpages decreases anddecreases…

Eventually we obtain p5 = 1and p1 = p2 = p3 = p4 = 0

Dangling Nodes

1 2

34

5


CNR-IEIIT Dangling Nodes

Example: pdf file with no

hyperlink

Benchmark: Web Lincoln

University, New Zealand

3756 nodes

31718 total #outgoing links

H. Ishii, R. Tempo (2014)



3255 dangling nodes (85%)

Red dots outgoing

links toward dangling

nodes

Blue dots are normal

links

White area corresponds

to no-links



Random surfer gets stuck when visiting a pdf file

In this case the “back button” of the browser is used

Easy fix: Add new links to make the matrix stochastic


CNR-IEIIT Easy Fix: Add New Link

1 2

3 4

5

We add a new outgoinglink from page 5 to page 3

0 1 0 0 0

1/ 3 0 0 1/ 2 0

1/ 3 0 0 1/ 2

1/ 3 0 0 0

0

1

0

0 0 1 0

A

In the benchmark this fix increases the #links from to 31718 to 40646

5


CNR-IEIIT Assumption: No Dangling Nodes

This is a web modeling problem

Assume that there are no dangling nodes or add artificiallinks

Hyperlink matrix A is a nonnegative stochastic matrix(instead of substochastic)


CNR-IEIIT

Random Surfer Model and Markov Chains

Random surfer model is represented as a Markov chain

where x(k) is a probability vector

x(k) [0,1]n and ∑i xi(k) = 1

xi(k) represents the importance of the page i at time k

( 1) ( )x k Ax k


CNR-IEIIT Convergence of the Markov Chain

Question: Does the Markov chain converge to astationary value

x(k) x* for k

representing the probability that the pages are visited?

Answer: No

Example:0 1 0 0

0 0 1 0

0 0 0 1

1 0 0 0

A

0

1(0)

0

0

x

x1(k)

k

1 -


CNR-IEIIT Teleportation Model

Recall that the matrix A is a nonnegative stochasticmatrix

We introduce a different model

Teleportation: After a while the random surfer gets boredand decides to “jump” to another page not directlyconnected to that currently visited

New page may be geographically or content-basedlocated far away


CNR-IEIIT Recall the Random Surfer Model

Web representation with incoming and outgoing links


CNR-IEIIT Recall the Random Surfer Model

6



We are “teleported” to a web page located far away


CNR-IEIIT Random Surfer Model Again



CNR-IEIIT Random Surfer Model Again



CNR-IEIIT Teleportation Model Again

We are teleported to another web page located far away


CNR-IEIIT Convex Combination of Matrices

Teleportation model is represented as a convexcombination of matrices A and S/n

S = 1 1T is a rank-one matrix

1 vector with all entries equal to one

Consider a matrix M defined as

M = (1 - m) A + m/n S m (0,1)

where n is the number of pages

The value m = 0.15 is used at Google

1 1

1 1

S


CNR-IEIIT Matrix M

M is a convex combination of two nonnegativestochastic matrices and m (0,1)

M is a strictly positive stochastic matrix

7


CNR-IEIIT Convergence of the Markov Chain

Consider the Markov chain

x(k+1) = M x(k)

where M is a strictly positive stochastic matrix

If ∑i xi(0) = 1 convergence is guaranteed by PerronTheorem

x(k) x* for k

x* = M x* = [(1 - m) A + m/n S] x* m (0,1)

Corresponding graph is strongly connected


CNR-IEIIT

PageRank: Bringing Order to the Web

Rank n web pages in order of importance

Ranking is provided by x*

PageRank x* of the hyperlink matrix M is defined as

x*=M x* where x* [0,1]n and ∑i xi* = 1

S. Brin, L. Page (1998)S. Brin, L. Page, R. Motwani, T. Winograd (1999)


CNR-IEIIT

PageRank: Bringing Order to the Web

Rank n web pages in order of importance

Ranking is provided by x*

PageRank x* of the hyperlink matrix M is defined as

x*=M x* where x* [0,1]n and ∑i xi* = 1

x* is the stationary distribution of the Markov Chain(steady-state probability that pages are visited is x* )

x* is a nonnegative unit eigenvector corresponding to theeigenvalue 1 of M


CNR-IEIIT

Positive Stochastic Matrices and Perron Theorem

The eigenvalue 1 of M is a simple eigenvalue and anyother eigenvalue (possibly complex) is strictly smallerthan 1 in absolute value

There exists an eigenvector x* of M with eigenvalue 1such that all components of x* are positive

M is irreducible and the corresponding graph is stronglyconnected (for any two vertices i,j there is a sequence ofedges which connects i to j)

Stationary probability vector x* is the eigenvector of M

x* exists and it is unique


CNR-IEIIT

Two vertices i, j V of an undirected graph G V ,E)are connected if there is a path between i and j

A graph is connected if every pair of vertices in thegraph is connected

A directed graph is a graph where the edges have adirection associated with them

Oriented graphs are directed graphs with no self-loops,no multiple adjacencies and no 2-cycles

Undirected, Directed, Oriented Graphs


CNR-IEIIT

A directed graph is called weakly connected if replacingall its directed edges with undirected edges produces aconnected (undirected) graph

A graph is said to be strongly connected if every vertexis reachable from every other vertex

Weakly and Strongly Connected Graphs

8


CNR-IEIIT

PageRank Computation


CNR-IEIIT

PageRank Computation withPower Method

PageRank is computed with the power method

x(k+1) = M x(k)

If ∑i xi(0) = 1 convergence is guaranteed because M is astrictly positive stochastic matrix (Perron Theorem)

x(k) x* for k

PageRank computation requires 50-100 iterations (40 inthe benchmark)

This computation takes about a week and it is performedcentrally at Google once a month


CNR-IEIIT Why m = 0.15?

Asymptotic rate of convergence of power method isexponential and depends on the ratio

We have

1(M) = 1 | 2(M) | ≤ 1 - m = 0.85

For Nit= 50 we have 0.8550≈ 2.95 10-4

For Nit= 100 we have 0.85100≈ 8.74 10-8

Larger m implies faster convergence, but numericallyunstable

2

1

| λ |

| λ | 1


CNR-IEIIT

PageRank Computation with Power Method

1 4

23

0.038 0.037 0.037 0.321

0.887 0.037 0.462 0.321

0.037 0.462 0.037 0.321

0.037 0.462 0.462 0.037

M

0 0 0 1/ 3

1 0 1/ 2 1/ 3

0 1/ 2 0 1/ 3

0 1/ 2 1/ 2 0

A

T* 0.12 0.33 0.26 0.29x

0.15m


CNR-IEIIT Size of the Web

The size of M is more than 8 billion (and it isincreasing)!

Sparsity in the web: 1020 entries

1012 non-zero entries


CNR-IEIIT

Why m = 0.15?Answer 2: Sensitivity

Study sensitivity wrt m taking

We obtain for i = 1, 2,…, n

A deeper analysis shows that if 2(M) is close to 1(M)then x*(m) is very sensitive for small m

Conclusion: m=0.15 is a good compromise

*d( )

d ix mm

*d 1( ) 6.66

d ix mm m

9


CNR-IEIIT Columbia River, The Dalles, Oregon


CNR-IEIIT Distributed Viewpoint

More and more computing power is needed…

… the distributed viewpoint because global informationabout the network is difficult to obtain


CNR-IEIIT

Randomized Decentralized Algorithms


CNR-IEIIT

Main idea: Develop Las Vegas randomized decentralized

algorithms for computing PageRank

Randomized Decentralized Algorithms for PageRank Computation


CNR-IEIIT

Monte Carlo and Las Vegas Randomized Algorithms


CNR-IEIIT Monte Carlo and Las Vegas

Monte Carlo was invented by Metropolis, Ulam, vonNeumann, Fermi, … in the fourties (Manhattan project)

Metropolis Fermi Ulam, Feymann, von Neumann

Las Vegas first appeared in computer science in the lateseventies

10


CNR-IEIIT Randomized Algorithm: Definition

Randomized Algorithm (RA): An algorithm that makesrandom choices during its execution to produce a result

Example: Matlab code

set_r =1:0.01:3;for k =1:length(set_r)

if (rand > 0.5) then a_opt(k) = hel(k);else a_opt(k) = 3.7;end if

a_lin(k) =(e/(e-1))*r;a_sub(k) =(a/(a-1))*(r+log(a)-1);

end


CNR-IEIIT Las Vegas Randomized Algorithm

Las Vegas Randomized Algorithm (LVRA): Arandomized algorithm that always produces correctresults, the only variation from one run to another is therunning time


CNR-IEIIT

Example of Las Vegas: Discrete Random Variables

q1 q2 q3 q4 q5 q6 q7 q8 q9 q10

Consider discrete random variables


CNR-IEIIT


q1 q2 q3 q4 q5 q6 q7 q8 q9 q10



CNR-IEIIT


q1 q2 q3 q4 q5 q6 q7 q8 q9 q10



CNR-IEIIT Las Vegas Randomized Algorithms

Las Vegas Randomized Algorithm (LVRA): Alwaysgive the correct solution

They are also called zero-sided randomized algorithms

The solution obtained with a LVRA is probabilistic, so“always” means with probability one

Running time may be different from one run to another

We study the average running time

11


CNR-IEIIT Las Vegas Viewpoint


The sample space is discrete and MN possible choicescan be made

In the binary case we have 2N

Finding maximum requires ordering the 2N choices

Las Vegas can be used for ordering real numbers

Example: RQS


CNR-IEIIT

Decentralized Communication Protocol - Randomization

1 4

23

Gossip communication protocol:

1. at time k randomly selectpage i (i=4) for PageRankupdate


CNR-IEIIT

Decentralized Communication Protocol – Outgoing Links

1 4

23

Gossip communication protocol:

1. at time k randomly selectpage i (i=4) for PageRankupdate

2. send PageRank value ofpage i to the outgoing pagesthat are linked to page i


CNR-IEIIT

Decentralized Communication Protocol – Incoming Links

1 4

23

Gossip communication protocol:at time k randomly select page i(i=4) for PageRank update

3. request PageRank values fromincoming pages that arelinked to page i


CNR-IEIIT Global Clock

The pages taking action are determined via a stochasticprocess θ(k) {1, …, n}

θ(k) is assumed to be i.i.d. with uniform probability

Prob{θ(k)=i} = 1/n

If at time k θ(k) = i then page i initiates PageRank update

θ(k) is a global clock known to all the pages

Clock synchronization problem


CNR-IEIIT Randomized Update Scheme

Consider the randomized update scheme

x(k+1) = Aθ(k) x(k)

where Aθ(k) are the decentralized link matrices related tothe matrix A

This is a Markovian jump system

12


CNR-IEIIT Decentralized Link Matrices - 1

0 0 0 1/ 3

1 0 1/ 2 1/ 3

0 1/ 2 0 1/ 3

0 1/ 2 1/ 2 0

A

4

1/ 3

1/ 3

1/ 3

0

A



0 0 0 1/ 3

1 0 1/ 2 1/ 3

0 1/ 2 0 1/ 3

0 1/ 2 1/ 2 0

A

4

0 0 1/ 3

0 0 1/ 3

0 0 1/ 3

0 0 0 0

A



0 0 0 1/ 3

1 0 1/ 2 1/ 3

0 1/ 2 0 1/ 3

0 1/ 2 1/ 2 0

A

4

1 0 0 1/ 3

0 1 0 1/ 3

0 0 1 1/ 3

0 0 0 0

A



0 0 0 1/ 3

1 0 1/ 2 1/ 3

0 1/ 2 0 1/ 3

0 1/ 2 1/ 2 0

A

4

1 0 0 1/ 3

0 1 0 1/ 3

0 0 1 1/ 3

0 0 0 0

A

3

1 0 0 0

0 1 1/ 2 0

0 0 0 0

0 0 1/ 2 1

A

Matrices Ai are very sparse

n -1 diagonal identity entries

n non-diagonal entries

Linear # entries

Algorithm easily implementable


CNR-IEIIT

Properties of the Algorithm: Average Matrix Aave

Average matrix Aave = E[ Aθ(k) ]

Taking uniform distribution in the stochastic process θ(k)we have

Aave = 1/n ∑i Ai

Lemma (properties of the average matrix Aave )

(i) Aave = 2/n A + (1-2/n) I

(ii) Matrices A and Aave have the same eigenvectorcorresponding to the eigenvalue 1


CNR-IEIIT Modified Randomized Update Scheme

Remark: Need to work with the teleportation model(positive stochastic matrix M)

Consider the modified randomized update scheme

x(k+1) = Mθ(k) x(k)

where Mθ(k) are the modified decentralized link matricescomputed as

Mi = (1- r) Ai + r/n S i = 1, 2, …, n

and r (0,1) is a design parameter

r = 2m/(n – mn + 2m)

13


CNR-IEIIT

Properties of the Algorithm:Average Matrix Mave

Average matrix Mave = E[ Mθ(k) ]

Define r = 2m/(n – mn + 2m)

Lemma (properties of the average matrix Mave)

(i) r (0,1) and r < m =0.15

(ii) Mave = r/m M + (1-r/m) I

(iii) Eigenvalue 1 for Mave is the unique eigenvalue ofmaximum modulus and PageRank is the correspondingeigenvector


CNR-IEIIT Undesired Oscillations

The state x(k) oscillates and there is no convergence to afixed point…


CNR-IEIIT Time-Averaging

Time averaging was introduced in the seventies toaccelerate convergence of stochastic approximationalgorithms

With time-averaging we remove oscillations

0

1( ) ( )

1

k

i

y k x ik


CNR-IEIIT Convergence Properties

Theorem (convergence properties)

The time average y(k) of the modified randomizedupdate scheme converges to PageRank x* in MSE

E[ ||y(k) - x*||2 ] 0 for k

provided that ∑i xi(0) = 1

Proof: Based on the theory of ergodic matrices

H. Ishii, R. Tempo (2010)


CNR-IEIIT Comments

Time average y(k) can be computed recursively as afunction of y(k-1)

Because of averaging convergence rate is 1/k

Sparsity of the matrix A is preserved because

x(k+1) = M x(k) = (1- r) A x(k) + r/n 1

where 1 is a vector with all entries equal to one0 0 0 1 / 3

1 0 1 / 2 1 / 3

0 1 / 2 0 1 / 3

0 1 / 2 1 / 2 0

A

0.038 0.037 0.037 0.321

0.887 0.037 0.462 0.321

0.037 0.462 0.037 0.321

0.037 0.462 0.462 0.037

M =


CNR-IEIIT Extensions - 1

Extensions to simultaneous

update of multiple web pages

with Bernoulli process

PageRank computation

with lossy communication

Other convergence properties of the algorithm hold

(almost-sure)

H. Ishii, R. Tempo and E. W. Bai (2012)W.-X. Zhao, H.-F. Chen and H. Fang (2013)

14


CNR-IEIIT Simulation Results for the Benchmark

There are two clusters of pages with dense link structurescorresponding to higher PageRank


CNR-IEIIT

Ranking Control Journals


CNR-IEIIT ISI Web of Knowledge


CNR-IEIIT Ranking Journals: Impact Factor

Impact Factor IF

Census period (2013) of one year and a window period (2011-2012) of two years

Remark: Impact Factor is a “flat criterion” (it does not take into account where the citations come from)

2013 IF2011- 2012

2011- 2012

number citations in 2013 of articles published in

number of articles published in


CNR-IEIIT ISI Web of Knowledge


CNR-IEIIT Ranking Journals: Eigenfactor

Eigenfactor EF

Ranking journals using ideas from PageRankcomputation in Google

In Eigenfactor journals are considered influential if theyare cited often by other influential journals

What is the probability that a journal is cited?

C. T. Bergstrom (2007)

15


CNR-IEIIT Random Reader Model

“Imagine that a researcher is to spend all eternity in the libraryrandomly following citations within scientific periodicals. Theresearcher begins by picking a random journal in the library.From this volume a random citation is selected. The researcherthen walks over to the journal referenced by this citation. Theprobability of doing so corresponds to the fraction of citationslinking the referenced journal to the referring one, conditioned onthe fact that the researcher starts from the referring journal. Fromthis new volume the researcher now selects another randomcitation and proceeds to that journal. This process is repeated adinfinitum.”

A. D. West, T. C. Bergstrom, C. T. Bergstrom (2010)


CNR-IEIIT

Random Reader Model: Graph Representation

Directed graph with journals (nodes) and citations (links)


CNR-IEIIT

Random Reader Model:Graph Representation


CNR-IEIIT



CNR-IEIIT



CNR-IEIIT


Directed graph with journals (nodes) and citations (links)

16


CNR-IEIIT

Self-citations are omitted

Normalization to obtain a column substochastic matrix

(note: if 0/0 we set aij= 0)

0

0

0

A

Cross-citation Matrix - 1

i j

i jk jk

aa

a

2013

2008 - 2012

a number citations in from journal j to ijarticles published in journal i in


CNR-IEIIT

0

0

0

0

A

Black Holes

1 2

34

5

Black holes: journals that do not cite any other journalColumns with entries equal to zeroA substochastic matrix


CNR-IEIIT

Article vector v

vi is the fraction of all published articles coming fromjournal i during the window period 2008-2012

Article vector v is a stochastic vector

Total number of journals n = 8400

Article Vector

2008 - 2012

2008 - 2012i

number articles published by journal i in v

number articles published by all journals in


CNR-IEIIT

* * * * 0

* * * * 0

* * * * 0

* * * * 0

* * * * 0

A

Cross-citation Matrix – 2

1 2

34

Replace A with stochastic matrix introducing the article vector v

A

5

1v2v

3v 4v

5v

1

2

3

4

5

* * * *

* * * *

* * * *

* * * *

* * * *

v

v

A v

v

v


CNR-IEIIT

Replace substochastic matrix A with stochastic matrix

introducing the article vector v

1

2

3

4

5

* * * *

* * * *

* * * *

* * * *

* * * *

v

v

A v

v

v

* * * * 0

* * * * 0

* * * * 0

* * * * 0

* * * * 0

A

Cross-citation Matrix - 2

A


CNR-IEIIT

Random Reader Model and Markov Chains

Rank n = 8336 journals in order of importance

Random reader model represented as a Markov chain

where x(k) [0,1]n and ∑i xi(k) = 1 is the journalinfluence vector at step k

xi(k) represents the importance of the journal i

Question: Does the Markov chain converge to astationary value representing the probability of citation?

( 1) ( )x k Ax k

17


CNR-IEIIT Convergence of the Markov Chain - 1

Answer: No (recall previous example)

Need to replace with the matrix M obtaining theEigenfactor equation

where is a weighting matrix

T(1 - ) 0.15M m A m v m 1

A

1 1T

n n

v v

v

v v

1



This avoids being trapped in a small set of journals


CNR-IEIIT Convergence of the Markov Chain - 2

Consider the Markov chain

x(k+1) = M x(k)

If ∑i xi(0) = 1 convergence is guaranteed by PerronTheorem because M is a positive stochastic matrix

x(k) x* for k

The corresponding graph is strongly connected

We have x* = M x*

Journal influence vector x* exists and it is unique

Eigenvector corresponding to the simple eigenvalue 1ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT

0

0

0

0

A

0

1

0

0

b

To preserve sparsity of A we write the journal influencevector iteration using power method

where b is the black holes vector having 1 for entriescorresponding to black holes and 0 elsewhere

PageRank Iteration

T( 1) (1- ) ( ) [(1- ) ( ) ]x k m Ax k m b x k +m v


CNR-IEIIT Eigenfactor EF

Journal influence vector x* is the steady-state fraction oftime spent reading each journal

x* provides weights to the citation values

Eigenfactor EFj of journal j is the

percentage of the total weighted

citations that journal j receives

from all 8336 journals

Article influence score AIj

100 ( *)

( *) j

ji

i

Ax

Ax

EF

0.01

j

jjv

EF

AI


CNR-IEIIT

2013 Impact Factor 2013 EigenfactorTM

3.4 IEEE CSM 1 Automatica 0.052

3.2 IEEE TAC 2 IEEE TAC 0.051

3.1 Automatica 3 SIAM J Contr & Opt 0.016

2.6 Int J Rob Nonlin Contr 4 Syst & Contr Lett 0.013

2.5 IEEE TCST 5 IEEE TCST 0.013

2.2 J Proc Contr 6 Int J Contr 0.009

1.9 Contr Eng Pract 7 Int J Rob Nonlin Contr 0.009

1.9 Syst & Contr Lett 8 J Proc Contr 0.008

1.4 SIAM J Contr & Opt 9 Contr Eng Pract 0.008

1.2 Math Contr Sign Sys 10 IEEE CSM 0.004

1.1 Int J Contr 11 Europ J Contr 0.002

0.8 Europ J Contr 12 Math Contr Sig Sys 0.001

18


CNR-IEIIT

Distributed Average Consensus


CNR-IEIIT Distributed Average Consensus

General problem: Consider a set of N agents each

having a numerical value

This numerical value is communicated to the

neighboring agents iteratively

Consensus: All agents eventually reach a common

value (average of the initial value)

Applications: Sensor networks, load balancing, multi-

vehicle coordination, UAVs, …


CNR-IEIIT Graph Formulation

Consider a network of N nodes and the graph (V, E)

E is the set of edges V is the set of nodes

graph is undirected

and connected

(no directions, no

self-loops, no

multiple edges)

at time k each node i has a scalar value xi(k); initial value is xi(0)

x5(k)

x1(k)

x4(k)

x3(k)

x2(k)


CNR-IEIIT Distributed Average Consensus

Objective: Derive a randomized algorithm (MC or LV)

such that

1. Nodes update the values xi(k) using neighbor

information

2. The values of the nodes converge to the average of

initial values xi(0)

Three different cases depending on the range of node

values: Real, integer (quantized) and binary values


CNR-IEIIT Real/Quantized/Binary Agent Values

Real and quantized case are very popular[1]

Monte Carlo (MCMC Markov Chain Monte Carlo)

algorithms are developed in these cases

The binary value case is not really studied in the

systems and control community…

Paradigm: Byzantine Agreement Problem

[1] P.J. Antsaklis, J. Baillieul (2007)


CNR-IEIIT

Byzantine Agreement Problem (ByzAgr)

19


CNR-IEIIT Byzantium, 1453 AD

Some history: In the year 1453 AD, the city of

Byzantium is under siege… powerful Ottoman

battalions are camped around the city on both sides of

the Bosporus, poised to launch the final attack…

The generals, located in several camps, are trying to

reach an agreement… they can communicate thanks to

the messenger service of the Ottoman Army…

Messages can be delivered certifying the identity of

the sender and preserving its content…


CNR-IEIIT Byzantium and the Bosporus


CNR-IEIIT Loyal and Traitor Generals

Loyal generals are trying to reach an agreement on

when they should launch the final attack, but…

Traitor generals are secretly conspiring… their aim is

to confuse the loyal generals so that an insufficient

number of generals is deceived into attacking…


CNR-IEIIT

Byzantine Agreement Problem (ByzAgr)[1]

Find a distributed consensus algorithm satisfying thefollowing conditions:

1. Non-triviality: If all generals have the same input, allloyal generals take a decision equal to the input

2. Agreement: All loyal generals should agree on thedecision

3. Limited dithering: Eventually all loyal generals shouldcome to a decision

[1] L. Lamport, R. Shostak, M. Pease (1982)


CNR-IEIIT Agents and Communication Mechanism

Synchronous: The agents have synchronized clocks

Reliable: A message is guaranteed to be delivered

Authenticated: Identity of the sender is known

Faulty agents: Hard to identify because they behave

maliciously instead of simply crashing; they may

collude to maximize damage

Point-to-point communication: Underlying topology is

that of a undirected and connected graph


CNR-IEIIT Binary Consensus

Consider N binary agents xi(k) {0, 1} for all i, k

There are Nf < N faulty agents

Binary consensus is achieved if each agent determines

a binary decision value yi {0, 1} such that:

1. All non-faulty agents arrive at the same decision value

2. If all agents have the same initial value xi(0), then the

non-faulty agents reach a common decision yi = xi(0)

20


CNR-IEIIT Distributed Agreement Protocol

Distributed agreement protocol proceeds in a sequence

of rounds

At each round each agent sends a binary message

(called vote, e.g. attack or retrieve) to the other agents

Non-faulty agents send the same vote to all the other

agents; faulty agents may send different votes to

different agents

Round is concluded when all agents receive a vote


CNR-IEIIT Randomized Algorithm for ByzAgr

First randomized algorithm is due to Rabin[1]

There is a global coin toss performed by a trusted

party consisting of

heads or tails

The result of the global coin toss is correctly

transmitted to all agents

All agents equally contribute to the decision

Majority vote: Count the number of votes received[1] M.O. Rabin (1983)


CNR-IEIIT Rabin’s Randomized Algorithm

Nf < N/8; N is a multiple of 8; L = (5N/8) + 1; H = (3N/4) + 1; G = 7N/8

input: vote, output: decision yi, for each round do

1. broadcast vote, receive votes from all other agents

2. set mi(k) ← majority value (0, 1) received

3. set ti(k) ← number of occurrences of mi(k)

4. if coin = heads, then set t(k) ← L, else t(k) ← H

5. if ti(k) t(k), then set vote ← mi(k), else vote ← 0

6. if ti(k) G, then set yi ← mi(k)

¯

¯ ¯


CNR-IEIIT Main Theorem for ByzAgr

Main Theorem[1]

- Binary consensus is achieved with probability one for

each initial condition

- Expected number of steps is a constant

[1] M.O. Rabin (1983)


CNR-IEIIT ByzAgr and LVRA

Rabin’s algorithm always (probabilistically) gives a

correct output

Number of rounds is a random variable because the

algorithm is based on randomization

The expected number of steps to reach consensus is a

constant

This algorithm is a LVRA


CNR-IEIIT

Randomized vs Deterministic Algorithms for ByzAgr

Deterministic algorithms requires Nf steps

Impossibility Theorem[1]

If there is no synchronized clock for all agents and if

the processing speed of the agents is different, then no

deterministic algorithm can achieve consensus (even in

the presence of a single faulty agent)

[1] M.J. Fisher, N.A. Lynch, M.S. Paterson (1985)

21


CNR-IEIIT

Consensus and PageRank


CNR-IEIIT Graph of Agents

Consider a graph of agents which communicate using adecentralized communication pattern (similar to that forPageRank)

The value of agent i at time k is xi

Values of agents may be updated using a LVRA

x(k+1) = Aθ(k) x(k)


CNR-IEIIT Decentralized Communication Pattern

Stochastic process θ(k) {1, …, d} where d is thenumber of communication patterns

θ(k) is assumed to be i.i.d. with uniform probability

Prob{θ(k)=i} = 1/d

If at time k θ(k) = i then agent i initiates update

Technical differences with communication protocol forPageRank but the ideas are similar


CNR-IEIIT Communication Pattern for Agent 1

1 4

2

1

* 0 0 *

* * 0 0

0 0 * 0

0 0 0 *

A

3

A1 is row stochastic1

1/ 2 0 0 1/ 2

1/ 2 1/ 2 0 0

0 0 1 0

0 0 0 1

A


CNR-IEIIT Consensus

We say that consensus w.p.1 is achieved if for any initialcondition x(0) we have

Prob

for all agents i, j lim ( ) ( ) 0 1i j

kx k x k


CNR-IEIIT Convergence Properties

Lemma (convergence properties)

Assuming that the graph is strongly connected, thescheme

x(k+1) = Aθ(k) x(k)

achieves consensus w.p.1

Prob

for all agents i, j and for any initial condition x(0)

Y. Hatano, M. Mesbahi (2005); C. W. Wu (2006)A. Tahbaz-Salehi, A. Jadbabaie (2008)

lim ( ) ( ) 0 1i jk

x k x k

22


CNR-IEIIT PageRank and Consensus

Consensus PageRank

all agent values become equal page values converge to constant

strongly connected agent graph web is not strongly connected

Ai are row stochastic matrices Ai and Mi are column stochastic

self-loops in the agent graph no self-loops in the web

convergence for any x(0) x(0) stochastic vector

time averaging y(k) not necessary averaging y(k) required


CNR-IEIIT

Economic Complexity


CNR-IEIIT Economic Complexity

Introduced in the paper

Hidalgo C. and Hausmann R. “The Building Blocks ofEconomic Complexity,” Proceedings of the NationalAcademy of Science, 106, 2009


CNR-IEIIT Bipartite Graphs - 1

Bipartite graph representing countries and products

countries v productsxv={v1,v2,v3,v4,v5} xu={u1,u2,u3,u4}


CNR-IEIIT Bipartite Graphs - 2

Bipartite graph representing countries and products

countries v products

xv={v1,v2,v3,v4,v5} xu={u1,u2,u3,u4}

Construct a binary cross-product matrix M

Compute PageRank xv* and xu* representing the fitnessof countries and the quality of products


CNR-IEIIT Economic Complexity

Tripartite graph representing countries, capabilities andproducts

theory of

hidden

capabilities

Study convergence of algorithms for PageRank

xv(k+1) = xv(k) and xu(k+1) = xu(k)

23


CNR-IEIIT

Distributed Algorithms for Web Aggregation


CNR-IEIIT

Aggregation, PageRank and Complexity

Aggregation is a tool to break complexity

Applications:

o webo power grido biologyo social networkso journalso e-commerce


CNR-IEIIT PageRank and Web Aggregation

View the web as a network of agents connected via links

Web aggregation

Approximate computation

of PageRank to reduce

computation cost and

communication at servers

Guaranteed bounds on the

approximation error

H. Ishii, R. Tempo, E.-W. Bai (2012)ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Graph Representation

1 4

2

6

3

5

0 1/2 0 0 0 0

1/ 2 0 1/3 0 0 0

0 1/2 0 1/3 0 0

1/ 2 0 1/3 0 0 1/2

0 0 0 1/3 0 1/2

0 0 1/3 1/3 1 0

A

T* 0.0614 0.0857 0.122 0.214 0.214 0.302x


CNR-IEIIT Web Aggregation

1 4

2

6

3

5

0 1/2 0 0 0 0

1/ 2 0 1/3 0 0 0

0 1/2 0 1/3 0 0

1/ 2 0 1/3 0 0 1/2

0 0 0 1/3 0 1/2

0 0 1/3 1/3 1 0

A

T* 0.0614 0.0857 0.122 0.214 0.214 0.302x U1

U2

U3


CNR-IEIIT Aggregation of Agents

1 4

2

6

3

5

U1

U2

U3

U3

U2

U1

24



Most links are intra‐host ones

Two-step approach:

1. Global step: Compute total PageRank for each group

2. Local step: Distribute value among group members



Singular perturbation

methods for PageRank

computation

Web pages having many external outlinks are consideredas single groups

number external outlinksδ

number outlinks

J.H. Chow, P.V. Kokotovic (1985)H. Ishii, R. Tempo, E.W. Bai (2012)


CNR-IEIIT Coordinate Change for PageRank

where is the aggregated PageRank (based on groupvalue), V1 and V2 are block diagonal matrices (afterreordering)

Objective: design a distributed randomized algorithm forcomputing approximately PageRank x* based on thecomputation of

* *x Vx*

1* *1*

22

Vxx x

Vx

*1x

*1x


CNR-IEIIT

PageRank equation

x* = (1- m) A x* + m/n 1

Internal: block diag stochastic

External: block diag norm bounded by 2

internalcommunications in the same group

external communications

inter-groups

PageRank Equation: Internal/External Communications


CNR-IEIIT

11

21 2

* *1 12

2

1* *2 2

(1 )0

A

A A

ux xA mm

nx x

New PageRank equation

where u = V1 1

is approximate PageRank

Approximate PageRank Equation

A A1 111A AI2 221 22

0(1 )

0

ux xA mm

nx xA A

Ax


CNR-IEIIT

A11 1

A1 (( 1) (1 ))

0A x

umx k m

nk

1. Group values: reduced order computation

2. Iteration

communications inter-groups

communications inter-groups

Aggregated PageRank Algorithm - 1

A A 1A A22 12 22 1( 1) ( ) ((1 )[ 1 )( ) ]x k x k m I- A A x km

A22A

A11 1 ( )A x k

A21 1 ( )A x k

communications

in the same group

25


CNR-IEIIT

3. Inverse state transformation

Theorem (convergence properties):

xA(k) xA for k in MSE sense provided that

xA (0) is a stochastic vector

1A A( ) ( )Vx k x k

Aggregated PageRank Algorithm - 2

1V

block diagonal matrix


CNR-IEIIT Approximation Error

Approximate PageRank

Theorem (approximation error):

Given ε > 0, take

then

A A1 111A AI2 221 22

0(1 )

0

ux xA mm

nx xA A

A 1 Ax V x

number external outlinks 4δ

number outlinks 4(1 )(1 )m

* A1|| - ||x x


CNR-IEIIT

Centrality Measures


CNR-IEIIT

How central are specific nodes or edges in a network?

PageRank and Network Centrality Measures


CNR-IEIIT Other Centrality Measures

Degree

Closeness

Betweenness


CNR-IEIIT Degree Centrality

Degree centrality: for each node count the number ofincoming links

26


CNR-IEIIT Closeness Centrality

Closeness centrality: a node is more central if it is closerto most of the other nodes

Defined as the “distance” from all the other nodes

2 => 1 dist = 1

3 => 1 dist = 2

4 => 1 dist = 3

5 => 1 dist = 5

6 => 1 dist = 4

total = 15ICT International Doctoral School, Trento @RT 2015

CNR-IEIIT Closeness Centrality

Closeness centrality: a node is more central if it is closerto most of the other nodes

Defined as the “distance” from all the other nodes


CNR-IEIIT Betweenness Centrality

Betweenness centrality ranks a node higher if it belongsto many paths between other nodes

1

# 1, 1B

# , 1

shortest paths i j passing through i j

shortest paths i j i j



2 => 3 0 4 => 5 0 4 => 6 0 5 => 6 0

2 => 4 1/2 total = 1/2 + 1/3 = 5/6

2 => 5 1/3

2 => 6 0

3 => 4 0

3 => 5 0

3 => 6 0

1

# 1, 1B

# , 1

shortest paths i j passing through i j

shortest paths i j i j




CNR-IEIIT Comparison of Centrality Measures

27


CNR-IEIIT

On-Going Work: Distributed Randomized Algorithms

Design distributed randomized algorithms for

centrality measures

Degree and closeness are “easy”

Betweenness computation is very hard for general

graphs


CNR-IEIIT

On-Going Work: Temporal Networks

Fixed number of nodes

Time-varying links

Need a new definition of PageRank

Design distributed randomized algorithms

Establish convergence properties

Goal: Handle web spam effectively

Avoid deliberate manipulation of search engines


CNR-IEIIT

References


CNR-IEIIT Key References

H. Ishii and R. Tempo, “The PageRank Problem, Multi-Agent Consensus and Web Aggregation,,” IEEE CSM,2014

R. Tempo, G. Calafiore, F. Dabbene, “RandomizedAlgorithms for Analysis and Control of UncertainSystems, with Applications,” Second Edition,Springer-Verlag, London, 2013

… more references on my website…


CNR-IEIIT PageRank (Other References)

O. Fercoq, M. Akian, M. Bouhtou, S. Gaubert, “PageRank,Markov Chains and Ergodic Control,” IEEE TAC, 2013

W.-X. Zhao, H.-F. Chen, H.T. Fang, “Almost-Sure Convergenceof the Randomized Scheme,” IEEE TAC, 2013

A.V. Nazin and B.T. Polyak. “Randomized Algorithm toDetermine the Eigenvector of a Stochastic Matrix,” Automationand Remote Control, 2011


CNR-IEIIT Eigenfactor References

J.D. West, T.C. Bergstrom, C.T. Bergstrom, “The EigenfactorMetrics: A Network Approach to Assessing Scholarly Journals,”College of Research Libraries, 2010

C.T. Bergstrom, “Eigenfactor: Measuring the Value and Prestigeof Scholarly Journals,” C&RL News, 2007

L. Leydesdorff, “How are New Citation-Based Journal IndicatorsAdding to the Bibliometric Toolbox?,” Journal of the AmericanSociety for Information Science and Technology, 2008

pagerank, centrality measures and consensus: a systems and ...assets.disi.unitn.it › ... ›...

Documents