romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · the main objects real-world web-graph g...

Models of Random Graphs and their Applicationsto the Web-graph analysis

Andrei Raigorodskii

Lomonosov Moscow State University,Moscow Institute of Physics and Technology,

Yandex,Moscow, Russia

RUSSIR, 25 - 29 August, 2015

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 1 / 36

The main objects

Real-world web-graphG = (V;E), where V —


The main objects


set of web-pages,


The main objects


set of web-pages,

set of web-sites,


The main objects


set of web-pages,

set of web-sites,

set of web-hosts,


The main objects


set of web-pages,

set of web-sites,

set of web-hosts,

and E — the set of all hyperlinks between the vertices (nodes).


The main objects


set of web-pages,

set of web-sites,

set of web-hosts,

and E — the set of all hyperlinks between the vertices (nodes).Sometimes multiple edges are identified. Sometimes multiple edges and evenloops are allowed.


The main objects


set of web-pages,

set of web-sites,

set of web-hosts,


Why do we need a model?


The main objects


set of web-pages,

set of web-sites,

set of web-hosts,


Why do we need a model?Many reasons!


The main objects


set of web-pages,

set of web-sites,

set of web-hosts,



Adjust algorithms;


The main objects


set of web-pages,

set of web-sites,

set of web-hosts,



Adjust algorithms;

Find unexpected structures (news, spam, etc.) using classifiers learnt onsome features coming from models.


How to construct a model?



Statistics

First, find some statistical properties of web-graphs that would describe mostaccurately the real-world structures.



Statistics

First, find some statistical properties of web-graphs that would describe mostaccurately the real-world structures.

Probability Theory

Then, take a random element G which takes values in a set of graphs on nvertices and has such a distribution that w.h.p. (with high probability, i.e., withprobability approaching 1 as n → ∞) G has the same properties as the onesmentioned above.


Some properties

Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.


Some properties


Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.


Some properties



Web-graphs have a unique “giant” connected component.


Some properties




Every two vertices in the giant component are connected by a path of shortlength (5–6, 15–20 depending on what we mean by web-graph): diamG ≈ 6(the rule of 6 handshakes).


Some properties





Web-graphs are robust when random vertices are destroyed (a giantcomponent survives).


Some properties






Web-graphs are vulnerable to attacks onto hubs (many small componentsappear after a threshold is surpassed).


Some properties







Many triangles — high clusternig.


Some properties







Many triangles — high clusternig.

The degree distribution is close to a power-law:

|{v ∈ V : deg v = d}|n ∼ constd ;where ∈ (2; 3) depends on what we mean by web-graph.


Erdos–Renyi model

Fix n ∈ N.


Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.


Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].


Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].

Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.


Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].


Take these edges independently, each with probability p.


Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].


Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).


Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].


Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).Now let n → ∞. We may vary p depending on n.


Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].



Of course intuitively G(n; p) does not reflect the reality. At least, it seemsunnatural to take the edges independently.


Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].




However, if p(n) = cn , then the expected number of edges in G(n; p) is cn−12 ,

so that we get sparsity. And the diameter is known to be small enough.


Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].






But in this case the distribution of the degrees is binomial with parametersn− 1; p, which is close, in turn, to Poisson with the expectation c. Not apower law at all!


Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].






But in this case the distribution of the degrees is binomial with parametersn− 1; p, which is close, in turn, to Poisson with the expectation c. Not apower law at all!

Would be great to explain somehow the power law phenomenon, not loosingthe sparsity and the diameter property.


Barabasi–Albert “model”



Fix an initial graph G0.




Add new vertices (web sites) one by one.





Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.






This is called “preferential attachment”: a new site prefers to send a link tothe existing sites according to their current popularity.







Obviously we have sparsity: n vertices and mn edges.








Barabasi and Albert claim that this model has, w.h.p., small diameter and —the most important — follows a power law.









Another pair of mathematicians...









Another pair of mathematicians... Bollobas and Riordan criticised — not theidea! — its realization.



First, of course the random graph depends greatly on the initial graph G0.




Second, Barabasi and Albert do not say any word on how is distributed theset of m targets of a new vertex among the old ones.





Finally, B and R prove that, roughly speaking, one can organize the process(take a G0 and a distribution of targets) in such a way that one gets, w.h.p.,an arbitrary number of triangles in the resulting random graph.





Finally, B and R prove that, roughly speaking, one can organize the process(take a G0 and a distribution of targets) in such a way that one gets, w.h.p.,an arbitrary number of triangles in the resulting random graph.

In principle, it’s good to have a class of models instead of just one model.The problem is that one can hardly understand from the papers by B and Afor which concrete model from this class the results are stated. Moreover,sometimes the results of different papers are contradictory (cannot be truesimultaneously).


Bollobas–Riordan model

Construct a random graph Gnm with n vertices and mn edges, m ∈ N.



Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.




Case m = 1G11 — graph with one vertex v1 and one loop.





Given Gn−11 we can make Gn

1 by adding vertex vn and an edge from it to a vertexvi, picked from {v1; : : : ; vn} with probability

P(i = s) =

{ dGn−11

(vs)2n−1 1 6 s 6 n− 11

2n−1 s = nAndrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36






P(i = s) =

{ dGn−11

(vs)2n−1 1 6 s 6 n− 11

2n−1 s = nPreferential attachment!







P(i = s) =

{ dGn−11

(vs)2n−1 1 6 s 6 n− 11


Case m > 1

Given Gmn1 we can make Gnm by gluing {v1; : : : ; vm} into v′1 , {vm+1; : : : ; v2m}

into v′2, and so on.







P(i = s) =

{ dGn−11

(vs)2n−1 1 6 s 6 n− 11


Case m > 1

Given Gmn1 we can make Gnm by gluing {v1; : : : ; vm} into v′1 , {vm+1; : : : ; v2m}

into v′2, and so on.

The random graph Gnm is certainly sparse. What’s about other properties?


Diameter and giant components



Theorem (Bollobas, Riordan)

If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .





Great, since for real values of n, we get lnnln lnn ∈ [5; 15].







If p ∈ (0; 1) and we make a random subgraph Gnm;p of the graph Gnm by deletingits vertices independently each with probability p, then w.h.p. Gnm;p contains aconnected component of size � n.








Great, since we have the robustness property.










If c ∈ (0; 1) and we make a random subgraph Gnm;c of the graph Gnm by deletingits [cn] first vertices, then for c 6 (m− 1)=(m+ 1), w.h.p. Gnm;c contains aconnected component of size � n, and for c > (m− 1)=(m+ 1), w.h.p. all theconnected components of Gnm;c are of size o(n).










If c ∈ (0; 1) and we make a random subgraph Gnm;c of the graph Gnm by deletingits [cn] first vertices, then for c 6 (m− 1)=(m+ 1), w.h.p. Gnm;c contains aconnected component of size � n, and for c > (m− 1)=(m+ 1), w.h.p. all theconnected components of Gnm;c are of size o(n).

Great, since we have the vulnerability to attacks on the hubs.Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36

Degree distribution


Degree distribution

Theorem (Bollobas, Riordan, Spencer, Tusnady)

If d 6 n1=15, then w.h.p.

|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:


Degree distribution




Great, since we get a power-law.


Degree distribution





Not too great, since the exponent in the power-law is a bit different from theexperimental ones ( ∈ (2; 3)).


Degree distribution






Bad, since d 6 n1=15, which is non-realistic.


Degree distribution







The last problem is completely solved: analog of B–R–S–T-theorem with anarbitrary d.


Degree distribution







The last problem is completely solved: analog of B–R–S–T-theorem with anarbitrary d.Tune the model somehow to get other exponents in the power-law?


Clustering


Clustering

Let G = (V;E), v ∈ V . Let Nv be the set of neighbours of v in G. Letnv = |Nv|. If nv > 2, thenCv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :


Clustering


Global clustering coefficient

The global clustering coefficient of G isT (G) =

∑v∈V C2nvCv∑v∈V C2nv :


Clustering


Global clustering coefficient

The global clustering coefficient of G isT (G) =

∑v∈V C2nvCv∑v∈V C2nv :

Let ](H;G) be the number of copies of a graph H in a graph G. ThenT (G) =3](K3; G)](P2; G)

;where K3 is a triangle and P2 is a 2-path.


Clustering


Clustering

Average local clustering coefficient

The average local clustering coefficient of G isC(G) =1

|V |∑v∈V Cv ;

where, again, Cv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :


Clustering



|V |∑v∈V Cv ;


The quantities T (G) and C(G) are quite different.


Clustering



|V |∑v∈V Cv ;



Let G be K2;n−2 plus one edge between the vertices in the part of size 2. ThenC(G) ∼ 1, but T (G) = Θ(

1n).Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 12 / 36

Clustering



|V |∑v∈V Cv ;



Let G be K2;n−2 plus one edge between the vertices in the part of size 2. ThenC(G) ∼ 1, but T (G) = Θ(

1n).Very important! However, many inaccuracies in the literature.


Clustering: experiments and models



Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ).



Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations.



Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations. This is wrong, provided we do not take into account multiple edgesand loops (which is widely done).




Theorem (Ostroumova, Samosvat)

If in a sequence {Gn} of graphs, the degrees of the vertices follow a power lawwith exponent ∈ (2; 3), then T (Gn) → 0 as n → ∞.






Newman might be right, provided we do take into account multiple edges andloops.







Theorem (Ostroumova)

There exist sequences {Gn} of multigraphs with loops, whose degrees of thevertices follow a power law with exponent ∈ (2; 3) and, nevertheless,T (Gn) > const as n → ∞.







Theorem (Ostroumova)

There exist sequences {Gn} of multigraphs with loops, whose degrees of thevertices follow a power law with exponent ∈ (2; 3) and, nevertheless,T (Gn) > const as n → ∞.

However, what is T (G), if G has multiple edges and loops? Many differentdefinitions, and Newman does not say a word about this subtlety!


Clustering: the B–R model




The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .





To calculate the value E(T (Gnm)), one needs to know the number oftriangles and the number of 2-paths.






Recall that ](H;G) is the number of copies of a graph H in a graph G.







A general and nice result was proved by Ryabchenko and Samosvat.







A general and nice result was proved by Ryabchenko and Samosvat.

Theorem (Ryabchenko, Samosvat)

For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is

the number of vertices of degree k in H .


Clustering: comments on the theorems for the B–Rmodel



Theorem 1 (Bollobas, Riordan)






Theorem 2 (Ryabchenko, Samosvat)










Theorem 2 agrees with Theorem 1: E(](K3; Gnm)) � ln3 n,E(](P2; Gnm)) � n lnn.









By Theorem 2 the number of K4 etc. is asymptotically constant: bad.










Unfortunately, C(Gnm) also approaches 0: even worse.











And, once again, = 3, not ∈ (2; 3).











And, once again, = 3, not ∈ (2; 3).

So let’s tune the model and try to calculate again the number of smallsubgraphs!


Buckley–Osthus model



Which problems we had in the model of Bollobas–Riordan? Non-realistic exponentin the power-law, non-realistic clustering. Can solve the first problem! Thefollowing model is very close to the first one, but it has one important newparameter a > 0 called initial attractiveness of a vertex.




Case m = 1H1a;1 — graph with one vertex v1 and one loop.





Given Hn−1a;1 we can make Hna;1 by adding vertex vn and an edge from it to avertex vi, picked from {v1; : : : ; vn} with probability

P(i = s) =

dHn−1a;1 (vs)+a−1

(a+1)n−1 1 6 s 6 n− 1a(a+1)n−1 s = n






P(i = s) =


(a+1)n−1 1 6 s 6 n− 1a(a+1)n−1 s = n

For a = 1, we get the model of Bollobas–Riordan.






P(i = s) =


(a+1)n−1 1 6 s 6 n− 1a(a+1)n−1 s = n

For a = 1, we get the model of Bollobas–Riordan.

Case m > 1

Given Hmna;1 we can make Hna;m by gluing {v1; : : : ; vm} into v′1 , {vm+1; : : : ; v2m}into v′2, and so on.


Buckley–Osthus model: degree distribution



Theorem (Buckley, Osthus)

If d 6 n1=(100(a+1)), then w.h.p.

|{v ∈ Hna;m : deg v = d}|n ∼ const(a;m)da+2:




If d 6 n1=(100(a+1)), then w.h.p.


Great, since now we can tune the model to get the expected exponent.




If d 6 n1=(100(a+1)), then w.h.p.



Bad, since d 6 n1=(100(a+1)).




If d 6 n1=(100(a+1)), then w.h.p.



Bad, since d 6 n1=(100(a+1)).

Completely solved.




If d 6 n1=(100(a+1)), then w.h.p.



Bad, since d 6 n1=(100(a+1)).

Completely solved.

Many more further great features of the model!


Buckley–Osthus model: second degrees of vertices



Let d2(t) = |{{i; j} : i 6= t; j 6= t; {i; t} ∈ E(Hna;1); {i; j} ∈ E(Hna;1)}|:



Let d2(t) = |{{i; j} : i 6= t; j 6= t; {i; t} ∈ E(Hna;1); {i; j} ∈ E(Hna;1)}|:So we calculate the number of edges of Hna;1 that are joined with a neighbour of agiven vertex t.



Let d2(t) = |{{i; j} : i 6= t; j 6= t; {i; t} ∈ E(Hna;1); {i; j} ∈ E(Hna;1)}|:So we calculate the number of edges of Hna;1 that are joined with a neighbour of agiven vertex t.Theorem (Ostroumova, Grechnikov, Kupavskiy, Tetali)

W.h.p.|{i = 1; : : : ; n : d2(i) = d}|n ∼ const(a)da+1

:Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 18 / 36


Let d2(t) = |{{i; j} : i 6= t; j 6= t; {i; t} ∈ E(Hna;1); {i; j} ∈ E(Hna;1)}|:So we calculate the number of edges of Hna;1 that are joined with a neighbour of agiven vertex t.Theorem (Ostroumova, Grechnikov, Kupavskiy, Tetali)

W.h.p.|{i = 1; : : : ; n : d2(i) = d}|n ∼ const(a)da+1

:Fits quite well to the real data.


Buckley–Osthus model: the number of edgesbetween vertices of given degrees



Let Xn(d1; d2) be the total number of edges between vertices of given degrees.



Let Xn(d1; d2) be the total number of edges between vertices of given degrees.Subtleties with the definition!



Let Xn(d1; d2) be the total number of edges between vertices of given degrees.Subtleties with the definition!Very important, since

|{v ∈ Vn : deg v = d}| =1d∑d1

Xn(d1; d):Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 19 / 36


Let Xn(d1; d2) be the total number of edges between vertices of given degrees.Subtleties with the definition!Very important, since

|{v ∈ Vn : deg v = d}| =1d∑d1

Xn(d1; d):Theorem (Grechnikov)

W.h.p. Xn(d1; d2)n ∼ c(a;m)

(

(d1 + d2)1−ad2

1d22

) :Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 19 / 36

Buckley–Osthus model: “power and glory”

Theorem (Grechnikov)

Let d1 > m and d2 > m. Let X = Xn(d1; d2). There exists a function cX(d1; d2)such that

EXn(d1; d2) = cX(d1; d2)n +Oa;m(1)

andcX(d1; d2) =Γ(d1 −m+ma)Γ(d2 −m+ma)

Γ(d1 −m+ma+ 2)Γ(d2 −m+ma + 2)×

× Γ(d1 + d2 − 2m+ 2ma+ 3)

Γ(d1 + d2 − 2m+ 2ma+ a + 2)ma(a+ 1)

Γ(ma+ a + 1)

Γ(ma) ×

×(

1 + �(d1; d2)(d1 −m+ma + 1)(d2 −m +ma+ 1)

(d1 + d2 − 2m+ 2ma+ 1)(d1 + d2 − 2m+ 2ma+ 2)

) ;where

−4 +2

1 +ma 6 �(d1; d2) 6 aΓ(ma+ 1)Γ(2ma+ a+ 3)

Γ(2ma+ 2)Γ(ma + a+ 2):


Bollobas–Riordan model: “power and glory”

Theorem (Grechnikov)

If d1 < k, d2 < k or d1 = d2 = k, then X = 0. If d1 > k; d2 > k andd1 + d2 > 2k + 1, then the expected value of X is

EX =k(k + 1)d1(d1 + 1)d2(d2 + 1)

(

1 −Ck+1

2k+2Cd1−kd1+d2−2kCd1+1d1+d2+2

)

(2kt+ 1)−

−k∑n=1

Cd1−nd1+d2−2nd1d2Cd1d1+d2

(

(2n)!n!(n+ 1)!

k + 1

2k + [n = k] (2k)!2(k − 1)!2

)

−

− [d1 = k] (k − 1)(k + 1)

2kd2(d2 + 1)− [d2 = k] (k − 1)(k + 1)

2kd1(d1 + 1)+Ok;d1;d2

(

1t) :Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 21 / 36

Buckley–Osthus model: one more evidence of itsquality



Assume that the web-graph is governed by the Buckley–Osthus model. What isthe most likely parameter a?



Assume that the web-graph is governed by the Buckley–Osthus model. What isthe most likely parameter a?We may try to find an optimal a by comparing the reality with the fact that thenumber of vertices of degree d is close to d−2−a (Grechnikov).




We may try to find an optimal a by comparing the reality with the fact that thenumber of edges between vertices of degree d1 and d2 is close to(d1 + d2)

1−ad−21 d−2

2 (Grechnikov).




We may try to find an optimal a by comparing the reality with the fact that thenumber of edges between vertices of degree d1 and d2 is close to(d1 + d2)

1−ad−21 d−2

2 (Grechnikov).

Assertion (Grechnikov, Zhukovskii, Vinogradov, Ostroumova, Pritykin,Gusev, Raigorodskii)

In both cases, the optimum is at the same a ≈ 0:27.


Buckley–Osthus model: an application

We have seen that the model fits quite well the reality. How could we apply it?




Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?





An algorithm

Calculate the total degrees of all the vertices of H (in the completeweb-graph).





An algorithm


For each pair of vertices of H calculate the expected number of edgesbetween them using Step 1 and Grechnikov’s results.





An algorithm



Sum all the values found at Step 2.





An algorithm




Compare the result of Step 3 with the real number of edges in H .





An algorithm




Compare the result of Step 3 with the real number of edges in H .

The difference between the real and the expected values can be used as a feature.


Buckley–Osthus model: clustering



Theorem (Eggemann, Noble)

If a > 1, then E(](K3; Hna;m)) � lnn as n → ∞.





It’s remarkable that for a = 1 (i.e., for the B–R model) we had ln2 n instead oflnn.







If a > 1, then E(](P2; Hna;m)) � n as n → ∞.







If a > 1, then E(](P2; Hna;m)) � n as n → ∞.


If a > 1, then E(T (Hna;m)) � lnnn as n → ∞.


Buckley–Osthus model: small subgraphs



Very recently Tilga has proved far-reaching generalizations and refinements of thetheorems by Bollobas–Riordan, Ryabchenko–Samosvat, and Eggemann–Noble.




Theorem (Tilga)

For any a > 0 and any fixed graph F , the order of magnitude of E(](F;Hna;m)) isfound.




Theorem (Tilga)

For any a > 0 and any fixed graph F , the order of magnitude of E(](F;Hna;m)) isfound.

The exact statement is quite cumbersome involving many parameters and cases.So we just give several most important and short enough corollaries.


Buckley–Osthus model: paths and cliques



Theorem (Tilga)

Let m > 2 and a < 1, � = 1a+1 . Let Pl be a path of length l. Then for n → ∞,

E(](Pl; Hna;m)

)

=

{n(2�−1)k+1 · Θ(ml) for l = 2k;n(2�−1)k+1 · lnn · Θ(ml) for l = 2k + 1:



Theorem (Tilga)


E(](Pl; Hna;m)

)

=

{n(2�−1)k+1 · Θ(ml) for l = 2k;n(2�−1)k+1 · lnn · Θ(ml) for l = 2k + 1:Theorem (Tilga)

Let Kk be a clique of size k, where 4 6 k 6 m + 1. Then for n → ∞,

E(](Kk; Hna;m)

)

=

n1+(�−1)(k−1) · Θ(mC2k) for a < 1k−2 ;lnn · Θ(mC2k) for a = 1k−2 ;Θ(mC2k) for a > 1k−2 :



Theorem (Tilga)


E(](Pl; Hna;m)

)

=

{n(2�−1)k+1 · Θ(ml) for l = 2k;n(2�−1)k+1 · lnn · Θ(ml) for l = 2k + 1:Theorem (Tilga)

Let Kk be a clique of size k, where 4 6 k 6 m + 1. Then for n → ∞,

E(](Kk; Hna;m)

)

=

n1+(�−1)(k−1) · Θ(mC2k) for a < 1k−2 ;lnn · Θ(mC2k) for a = 1k−2 ;Θ(mC2k) for a > 1k−2 :

For example, if a = 13 (close to 0:27), then the number of K5 is about logn, and

the number of K4 is about 4√n. Much more realistic than in the B–R model!


Buckley–Osthus model: cycles and bicliques



Theorem (Tilga)

Let Cl be a cycle of length l. Then for n → ∞,

E(](Cl; Hna;m)

)

=

{n(2�−1)k · Θ(ml) for l = 2k;n(2�−1)k · lnn · Θ(ml) for l = 2k + 1:



Theorem (Tilga)


E(](Cl; Hna;m)

)

=

{n(2�−1)k · Θ(ml) for l = 2k;n(2�−1)k · lnn · Θ(ml) for l = 2k + 1:Theorem (Tilga)

Let Kk;l be a biclique with 2 6 l 6 min{k;m}. Then for n → ∞,

E(](Kk;l; Hna;m)

)

=

nk(1+(�−1)l) · Θ(mkl) for a < 1l−1 ;(lnn)k · Θ(mkl) for a = 1l−1 ;Θ(mkl) for a > 1l−1 :



Theorem (Tilga)


E(](Cl; Hna;m)

)

=

{n(2�−1)k · Θ(ml) for l = 2k;n(2�−1)k · lnn · Θ(ml) for l = 2k + 1:Theorem (Tilga)

Let Kk;l be a biclique with 2 6 l 6 min{k;m}. Then for n → ∞,

E(](Kk;l; Hna;m)

)

=

nk(1+(�−1)l) · Θ(mkl) for a < 1l−1 ;(lnn)k · Θ(mkl) for a = 1l−1 ;Θ(mkl) for a > 1l−1 :

The number of bicliques shows how many communities are formed. For example,if a = 1

3 (close to 0:27), then there are many Kk;4 and a lot of Kk;3, which wasimpossible in the B–R model (there are no vertices of degree < 3 in such graphs).


Classical “Google” PageRankGn = (Vn; En) — web-graph.


Classical “Google” PageRankGn = (Vn; En) — web-graph.Classical “Google” PageRank is the solution �(n) = (�i(n))i=1;:::;|Vn| to thesystem �i(n) = c∑j→i �j(n)

outdeg j +1 − c|Vn| ; i = 1; : : : ; |Vn|:



outdeg j +1 − c|Vn| ; i = 1; : : : ; |Vn|:

The coordinates �i(n) follow a power-law similarly to the degrees (with anotherparameter of the law).



outdeg j +1 − c|Vn| ; i = 1; : : : ; |Vn|:


The same is true in a Barabasi–Albert model.



outdeg j +1 − c|Vn| ; i = 1; : : : ; |Vn|:


The same is true in a Barabasi–Albert model.

Theorem (Avrachenkov)

For i > 0

E�i(n) =1 − c1 + n ( 1

1 + c +cΓ (i+ 1

2

)

Γ(n + c

2 + 1)

(1 + c)Γ (i+ c2 + 1

)

Γ(n + 1

2

)

)

≈

≈ 1 − c1 + n ( 1

1 + c +c

1 + c (i +1

2

)− 1+c2(n +

1

2

)1+c

2

) :Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 28 / 36

Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).



To each i ∈ V we assign an l-dimensional vector vi of features, and to each edgei → j in E we assign an m-dimensional vector ei;j of features.




We have three parameters to be adjusted optimally.





Two main parameters

An l-dimensional vector !, which gives us a weight of a vertex i ∈ V : theweight is f(!; i) = (!;vi) — scalar product.





Two main parameters


An m-dimensional vector ', which gives us a weight of an edge i → j in E:the weight is g('; i → j) = ('; ei;j).





Two main parameters


An m-dimensional vector ', which gives us a weight of an edge i → j in E:the weight is g('; i → j) = ('; ei;j).

The third parameter c ∈ (0; 1) will appear on the next slide.


Weighted “Yandex” PageRank

Main definition

Weighted PageRank is a vector � — the solution to a system � = A�, where A isa matrix with entriesAi;j = c f(!; j)

∑k∈V f(!; k) + (1 − c) g('; i → j)∑h: i→h g('; i → h)

:



Main definition


∑k∈V f(!; k) + (1 − c) g('; i → j)∑h: i→h g('; i → h)

:Main problem

Let S be a measure of the difference between our PageRank �(!; '; c) and someestimates assigned to each document (to each i ∈ V ) according to a given searchquery (e.g., S is the standard deviation). Find

(!0; '0; c0) = argmin!;'S(�(!; '; c); vector of estimates):Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 30 / 36


Main definition


∑k∈V f(!; k) + (1 − c) g('; i → j)∑h: i→h g('; i → h)

:Main problem

Let S be a measure of the difference between our PageRank �(!; '; c) and someestimates assigned to each document (to each i ∈ V ) according to a given searchquery (e.g., S is the standard deviation). Find

(!0; '0; c0) = argmin!;'S(�(!; '; c); vector of estimates):Work in progress. Recent advances are made in cooperation with Nesterov and hisresearch group.


A new general class of models



Buckley–Osthus is not ideal for clustering. Can one do anything to improve it?




Many multiparametric models.





A break-through is due to Ryabchenko, Samosvat, Ostroumova.





A break-through is due to Ryabchenko, Samosvat, Ostroumova.

The PA-class

Let Gnm (n > n0) be a graph with n vertices {1; : : : ; n} and mn edges obtained asa result of the following random graph process. We start at the time n0 from anarbitrary graph Gn0m with n0 vertices and mn0 edges. On the (n+ 1)-th step(n > n0), we make the graph Gn+1m from Gnm by adding a new vertex n + 1 andm edges connecting this vertex to some m vertices from the set {1; : : : ; n; n+ 1}.Denote by dnv the degree of a vertex v in Gnm. Assume that for some constants Aand B the following conditions are satisfied:


A new general class of models: continuation

The PA-class conditions

P(dn+1v = dnv | Gnm) = 1 −Adnvn −B 1n +O((dnv )2n2

) ; 1 6 v 6 n ; (1)

P(dn+1v = dnv + 1 | Gnm) = Adnvn +B 1n +O((dnv )

2n2

) ; 1 6 v 6 n ; (2)

P(dn+1v = dnv + j | Gnm) = O((dnv )

2n2

) ; 2 6 j 6 m; 1 6 v 6 n ; (3)

P(dn+1n+1 = m+ j) = O( 1n) ; 1 6 j 6 m : (4)





) ; 1 6 v 6 n ; (1)


2n2

) ; 1 6 v 6 n ; (2)


2n2

) ; 2 6 j 6 m; 1 6 v 6 n ; (3)

P(dn+1n+1 = m+ j) = O( 1n) ; 1 6 j 6 m : (4)

For A = 1=2, B = 0, we get Bollobas–Riordan.





) ; 1 6 v 6 n ; (1)


2n2

) ; 1 6 v 6 n ; (2)


2n2

) ; 2 6 j 6 m; 1 6 v 6 n ; (3)

P(dn+1n+1 = m+ j) = O( 1n) ; 1 6 j 6 m : (4)

For A = 1=2, B = 0, we get Bollobas–Riordan.

For A = 1=(1 + a), B = ma=(1 + a), we get Buckley–Osthus.


A new general class of models: results and questions



Theorem (Ostroumova, Ryabchenko, Samosvat)

W.h.p.|{v ∈ Gnm : deg v = d}|n ∼ const(A;B;m)d1+1=A :






If 2A < 1 then w.h.p. T (n) ∼ c(A;B;m) ;If 2A = 1 then w.h.p. T (n) ∼ c′(A;B;m)

lnn ;If 2A > 1 then for any " > 0 w.h.p. n1−2A−" 6 T (n) 6 n1−2A+" :








Great, since in the first case, we have a constant clustering together withpower-law!









Open questions and questions under solution

(Ostroumova, Krot) The local clustering coefficient is constant in n in thenew models and is proportional to 1=d in the degree d.









Open questions and questions under solution

(Ostroumova, Krot) The local clustering coefficient is constant in n in thenew models and is proportional to 1=d in the degree d.What’s with “Google”, “Yandex” and other PageRanks in the new models?


“Fresh” models


“Fresh” models

Another problem of the preferential attachment models: the older is a page, thegreater is the probability that it will gain more and more citations. That’s notcompletely realistic, for we have a large part of the Internet containing news andother media.


“Fresh” models


Need to use an age factor for each page.


“Fresh” models



Samosvat and Ostroumova proposed a class of such models.


“Fresh” models



Samosvat and Ostroumova proposed a class of such models.

Let’s construct a sequence of random graphs {Gn}. For that, consider an integerconstant m (vertex outdegree), an integer function N(n), and a sequence ofmutually independent random variables �1; �2; : : : with some given distributiontaking positive values.


“Fresh” models


“Fresh” models

First, take 2 vertices and 1 edge between them (graph Gn2 ). The first 2 vertices

have “inherent qualities” q(1) = �1, q(2) = �2. At the t+ 1-th step(2 6 t 6 n− 1) one vertex and m edges are added to Gnt :q(t+ 1) = �t+1; P(t + 1 → i) =

attrt(i)∑tj=1 attrt(j) ;

where

attrt(i) = (1 or q(i)) · (1 or dt(i)) · (1 or I[i > t−N(n)] or e− t−iN(n)

)

and dt(i) is the degree of the vertex i in Gnt .


“Fresh” models

First, take 2 vertices and 1 edge between them (graph Gn2 ). The first 2 vertices

have “inherent qualities” q(1) = �1, q(2) = �2. At the t+ 1-th step(2 6 t 6 n− 1) one vertex and m edges are added to Gnt :q(t+ 1) = �t+1; P(t + 1 → i) =

attrt(i)∑tj=1 attrt(j) ;

where

attrt(i) = (1 or q(i)) · (1 or dt(i)) · (1 or I[i > t−N(n)] or e− t−iN(n)

)

and dt(i) is the degree of the vertex i in Gnt .

Let Gn = Gnn.


“Fresh” models


“Fresh” models

Theorem (Samosvat, Ostroumova)

The most realistic case is when attrt(i) = q(i) · e− t−iN(n) and �i have a power law(say, Pareto) distribution. In this case, not only w.h.p. the fraction of the numberof vertices with a given degree follows a power law with a tunable exponent, butalso the quantity e(T ), which is the fraction of edges that connect vertices withage difference greater than T , decreases exponentially, right as in the reality.


“Fresh” models

Theorem (Samosvat, Ostroumova)

The most realistic case is when attrt(i) = q(i) · e− t−iN(n) and �i have a power law(say, Pareto) distribution. In this case, not only w.h.p. the fraction of the numberof vertices with a given degree follows a power law with a tunable exponent, butalso the quantity e(T ), which is the fraction of edges that connect vertices withage difference greater than T , decreases exponentially, right as in the reality.

The model is used in Yandex to substantially improve the crawling algorithm.


romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · the main objects real-world web-graph g...

Documents