romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · the main objects real-world web-graph g...

190
Models of Random Graphs and their Applications to the Web-graph analysis Andrei Raigorodskii Lomonosov Moscow State University, Moscow Institute of Physics and Technology, Yandex, Moscow, Russia RUSSIR, 25 - 29 August, 2015 Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 1 / 36

Upload: others

Post on 12-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Models of Random Graphs and their Applicationsto the Web-graph analysis

Andrei Raigorodskii

Lomonosov Moscow State University,Moscow Institute of Physics and Technology,

Yandex,Moscow, Russia

RUSSIR, 25 - 29 August, 2015

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 1 / 36

Page 2: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

The main objects

Real-world web-graphG = (V;E), where V —

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36

Page 3: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

The main objects

Real-world web-graphG = (V;E), where V —

set of web-pages,

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36

Page 4: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

The main objects

Real-world web-graphG = (V;E), where V —

set of web-pages,

set of web-sites,

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36

Page 5: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

The main objects

Real-world web-graphG = (V;E), where V —

set of web-pages,

set of web-sites,

set of web-hosts,

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36

Page 6: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

The main objects

Real-world web-graphG = (V;E), where V —

set of web-pages,

set of web-sites,

set of web-hosts,

and E — the set of all hyperlinks between the vertices (nodes).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36

Page 7: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

The main objects

Real-world web-graphG = (V;E), where V —

set of web-pages,

set of web-sites,

set of web-hosts,

and E — the set of all hyperlinks between the vertices (nodes).Sometimes multiple edges are identified. Sometimes multiple edges and evenloops are allowed.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36

Page 8: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

The main objects

Real-world web-graphG = (V;E), where V —

set of web-pages,

set of web-sites,

set of web-hosts,

and E — the set of all hyperlinks between the vertices (nodes).Sometimes multiple edges are identified. Sometimes multiple edges and evenloops are allowed.

Why do we need a model?

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36

Page 9: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

The main objects

Real-world web-graphG = (V;E), where V —

set of web-pages,

set of web-sites,

set of web-hosts,

and E — the set of all hyperlinks between the vertices (nodes).Sometimes multiple edges are identified. Sometimes multiple edges and evenloops are allowed.

Why do we need a model?Many reasons!

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36

Page 10: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

The main objects

Real-world web-graphG = (V;E), where V —

set of web-pages,

set of web-sites,

set of web-hosts,

and E — the set of all hyperlinks between the vertices (nodes).Sometimes multiple edges are identified. Sometimes multiple edges and evenloops are allowed.

Why do we need a model?Many reasons!

Adjust algorithms;

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36

Page 11: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

The main objects

Real-world web-graphG = (V;E), where V —

set of web-pages,

set of web-sites,

set of web-hosts,

and E — the set of all hyperlinks between the vertices (nodes).Sometimes multiple edges are identified. Sometimes multiple edges and evenloops are allowed.

Why do we need a model?Many reasons!

Adjust algorithms;

Find unexpected structures (news, spam, etc.) using classifiers learnt onsome features coming from models.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36

Page 12: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

How to construct a model?

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 3 / 36

Page 13: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

How to construct a model?

Statistics

First, find some statistical properties of web-graphs that would describe mostaccurately the real-world structures.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 3 / 36

Page 14: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

How to construct a model?

Statistics

First, find some statistical properties of web-graphs that would describe mostaccurately the real-world structures.

Probability Theory

Then, take a random element G which takes values in a set of graphs on nvertices and has such a distribution that w.h.p. (with high probability, i.e., withprobability approaching 1 as n → ∞) G has the same properties as the onesmentioned above.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 3 / 36

Page 15: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Some properties

Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36

Page 16: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Some properties

Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.

Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36

Page 17: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Some properties

Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.

Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.

Web-graphs have a unique “giant” connected component.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36

Page 18: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Some properties

Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.

Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.

Web-graphs have a unique “giant” connected component.

Every two vertices in the giant component are connected by a path of shortlength (5–6, 15–20 depending on what we mean by web-graph): diamG ≈ 6(the rule of 6 handshakes).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36

Page 19: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Some properties

Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.

Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.

Web-graphs have a unique “giant” connected component.

Every two vertices in the giant component are connected by a path of shortlength (5–6, 15–20 depending on what we mean by web-graph): diamG ≈ 6(the rule of 6 handshakes).

Web-graphs are robust when random vertices are destroyed (a giantcomponent survives).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36

Page 20: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Some properties

Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.

Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.

Web-graphs have a unique “giant” connected component.

Every two vertices in the giant component are connected by a path of shortlength (5–6, 15–20 depending on what we mean by web-graph): diamG ≈ 6(the rule of 6 handshakes).

Web-graphs are robust when random vertices are destroyed (a giantcomponent survives).

Web-graphs are vulnerable to attacks onto hubs (many small componentsappear after a threshold is surpassed).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36

Page 21: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Some properties

Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.

Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.

Web-graphs have a unique “giant” connected component.

Every two vertices in the giant component are connected by a path of shortlength (5–6, 15–20 depending on what we mean by web-graph): diamG ≈ 6(the rule of 6 handshakes).

Web-graphs are robust when random vertices are destroyed (a giantcomponent survives).

Web-graphs are vulnerable to attacks onto hubs (many small componentsappear after a threshold is surpassed).

Many triangles — high clusternig.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36

Page 22: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Some properties

Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.

Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.

Web-graphs have a unique “giant” connected component.

Every two vertices in the giant component are connected by a path of shortlength (5–6, 15–20 depending on what we mean by web-graph): diamG ≈ 6(the rule of 6 handshakes).

Web-graphs are robust when random vertices are destroyed (a giantcomponent survives).

Web-graphs are vulnerable to attacks onto hubs (many small componentsappear after a threshold is surpassed).

Many triangles — high clusternig.

The degree distribution is close to a power-law:

|{v ∈ V : deg v = d}|n ∼ constd ;where ∈ (2; 3) depends on what we mean by web-graph.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36

Page 23: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Erdos–Renyi model

Fix n ∈ N.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36

Page 24: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36

Page 25: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36

Page 26: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].

Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36

Page 27: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].

Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.

Take these edges independently, each with probability p.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36

Page 28: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].

Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.

Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36

Page 29: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].

Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.

Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).Now let n → ∞. We may vary p depending on n.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36

Page 30: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].

Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.

Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).Now let n → ∞. We may vary p depending on n.

Of course intuitively G(n; p) does not reflect the reality. At least, it seemsunnatural to take the edges independently.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36

Page 31: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].

Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.

Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).Now let n → ∞. We may vary p depending on n.

Of course intuitively G(n; p) does not reflect the reality. At least, it seemsunnatural to take the edges independently.

However, if p(n) = cn , then the expected number of edges in G(n; p) is cn−12 ,

so that we get sparsity. And the diameter is known to be small enough.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36

Page 32: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].

Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.

Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).Now let n → ∞. We may vary p depending on n.

Of course intuitively G(n; p) does not reflect the reality. At least, it seemsunnatural to take the edges independently.

However, if p(n) = cn , then the expected number of edges in G(n; p) is cn−12 ,

so that we get sparsity. And the diameter is known to be small enough.

But in this case the distribution of the degrees is binomial with parametersn− 1; p, which is close, in turn, to Poisson with the expectation c. Not apower law at all!

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36

Page 33: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Erdos–Renyi model

Fix n ∈ N.

Let Vn = {1; 2; : : : ; n}, m = C2n.

Fix p ∈ [0; 1].

Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.

Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).Now let n → ∞. We may vary p depending on n.

Of course intuitively G(n; p) does not reflect the reality. At least, it seemsunnatural to take the edges independently.

However, if p(n) = cn , then the expected number of edges in G(n; p) is cn−12 ,

so that we get sparsity. And the diameter is known to be small enough.

But in this case the distribution of the degrees is binomial with parametersn− 1; p, which is close, in turn, to Poisson with the expectation c. Not apower law at all!

Would be great to explain somehow the power law phenomenon, not loosingthe sparsity and the diameter property.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36

Page 34: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36

Page 35: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

Fix an initial graph G0.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36

Page 36: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

Fix an initial graph G0.

Add new vertices (web sites) one by one.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36

Page 37: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

Fix an initial graph G0.

Add new vertices (web sites) one by one.

Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36

Page 38: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

Fix an initial graph G0.

Add new vertices (web sites) one by one.

Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.

This is called “preferential attachment”: a new site prefers to send a link tothe existing sites according to their current popularity.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36

Page 39: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

Fix an initial graph G0.

Add new vertices (web sites) one by one.

Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.

This is called “preferential attachment”: a new site prefers to send a link tothe existing sites according to their current popularity.

Obviously we have sparsity: n vertices and mn edges.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36

Page 40: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

Fix an initial graph G0.

Add new vertices (web sites) one by one.

Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.

This is called “preferential attachment”: a new site prefers to send a link tothe existing sites according to their current popularity.

Obviously we have sparsity: n vertices and mn edges.

Barabasi and Albert claim that this model has, w.h.p., small diameter and —the most important — follows a power law.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36

Page 41: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

Fix an initial graph G0.

Add new vertices (web sites) one by one.

Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.

This is called “preferential attachment”: a new site prefers to send a link tothe existing sites according to their current popularity.

Obviously we have sparsity: n vertices and mn edges.

Barabasi and Albert claim that this model has, w.h.p., small diameter and —the most important — follows a power law.

Another pair of mathematicians...

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36

Page 42: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

Fix an initial graph G0.

Add new vertices (web sites) one by one.

Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.

This is called “preferential attachment”: a new site prefers to send a link tothe existing sites according to their current popularity.

Obviously we have sparsity: n vertices and mn edges.

Barabasi and Albert claim that this model has, w.h.p., small diameter and —the most important — follows a power law.

Another pair of mathematicians... Bollobas and Riordan criticised — not theidea! — its realization.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36

Page 43: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

First, of course the random graph depends greatly on the initial graph G0.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 7 / 36

Page 44: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

First, of course the random graph depends greatly on the initial graph G0.

Second, Barabasi and Albert do not say any word on how is distributed theset of m targets of a new vertex among the old ones.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 7 / 36

Page 45: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

First, of course the random graph depends greatly on the initial graph G0.

Second, Barabasi and Albert do not say any word on how is distributed theset of m targets of a new vertex among the old ones.

Finally, B and R prove that, roughly speaking, one can organize the process(take a G0 and a distribution of targets) in such a way that one gets, w.h.p.,an arbitrary number of triangles in the resulting random graph.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 7 / 36

Page 46: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Barabasi–Albert “model”

First, of course the random graph depends greatly on the initial graph G0.

Second, Barabasi and Albert do not say any word on how is distributed theset of m targets of a new vertex among the old ones.

Finally, B and R prove that, roughly speaking, one can organize the process(take a G0 and a distribution of targets) in such a way that one gets, w.h.p.,an arbitrary number of triangles in the resulting random graph.

In principle, it’s good to have a class of models instead of just one model.The problem is that one can hardly understand from the papers by B and Afor which concrete model from this class the results are stated. Moreover,sometimes the results of different papers are contradictory (cannot be truesimultaneously).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 7 / 36

Page 47: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Bollobas–Riordan model

Construct a random graph Gnm with n vertices and mn edges, m ∈ N.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36

Page 48: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Bollobas–Riordan model

Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36

Page 49: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Bollobas–Riordan model

Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.

Case m = 1G11 — graph with one vertex v1 and one loop.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36

Page 50: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Bollobas–Riordan model

Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.

Case m = 1G11 — graph with one vertex v1 and one loop.

Given Gn−11 we can make Gn

1 by adding vertex vn and an edge from it to a vertexvi, picked from {v1; : : : ; vn} with probability

P(i = s) =

{ dGn−11

(vs)2n−1 1 6 s 6 n− 11

2n−1 s = nAndrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36

Page 51: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Bollobas–Riordan model

Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.

Case m = 1G11 — graph with one vertex v1 and one loop.

Given Gn−11 we can make Gn

1 by adding vertex vn and an edge from it to a vertexvi, picked from {v1; : : : ; vn} with probability

P(i = s) =

{ dGn−11

(vs)2n−1 1 6 s 6 n− 11

2n−1 s = nPreferential attachment!

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36

Page 52: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Bollobas–Riordan model

Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.

Case m = 1G11 — graph with one vertex v1 and one loop.

Given Gn−11 we can make Gn

1 by adding vertex vn and an edge from it to a vertexvi, picked from {v1; : : : ; vn} with probability

P(i = s) =

{ dGn−11

(vs)2n−1 1 6 s 6 n− 11

2n−1 s = nPreferential attachment!

Case m > 1

Given Gmn1 we can make Gnm by gluing {v1; : : : ; vm} into v′1 , {vm+1; : : : ; v2m}

into v′2, and so on.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36

Page 53: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Bollobas–Riordan model

Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.

Case m = 1G11 — graph with one vertex v1 and one loop.

Given Gn−11 we can make Gn

1 by adding vertex vn and an edge from it to a vertexvi, picked from {v1; : : : ; vn} with probability

P(i = s) =

{ dGn−11

(vs)2n−1 1 6 s 6 n− 11

2n−1 s = nPreferential attachment!

Case m > 1

Given Gmn1 we can make Gnm by gluing {v1; : : : ; vm} into v′1 , {vm+1; : : : ; v2m}

into v′2, and so on.

The random graph Gnm is certainly sparse. What’s about other properties?

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36

Page 54: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Diameter and giant components

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36

Page 55: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Diameter and giant components

Theorem (Bollobas, Riordan)

If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36

Page 56: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Diameter and giant components

Theorem (Bollobas, Riordan)

If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .

Great, since for real values of n, we get lnnln lnn ∈ [5; 15].

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36

Page 57: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Diameter and giant components

Theorem (Bollobas, Riordan)

If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .

Great, since for real values of n, we get lnnln lnn ∈ [5; 15].

Theorem (Bollobas, Riordan)

If p ∈ (0; 1) and we make a random subgraph Gnm;p of the graph Gnm by deletingits vertices independently each with probability p, then w.h.p. Gnm;p contains aconnected component of size � n.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36

Page 58: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Diameter and giant components

Theorem (Bollobas, Riordan)

If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .

Great, since for real values of n, we get lnnln lnn ∈ [5; 15].

Theorem (Bollobas, Riordan)

If p ∈ (0; 1) and we make a random subgraph Gnm;p of the graph Gnm by deletingits vertices independently each with probability p, then w.h.p. Gnm;p contains aconnected component of size � n.

Great, since we have the robustness property.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36

Page 59: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Diameter and giant components

Theorem (Bollobas, Riordan)

If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .

Great, since for real values of n, we get lnnln lnn ∈ [5; 15].

Theorem (Bollobas, Riordan)

If p ∈ (0; 1) and we make a random subgraph Gnm;p of the graph Gnm by deletingits vertices independently each with probability p, then w.h.p. Gnm;p contains aconnected component of size � n.

Great, since we have the robustness property.

Theorem (Bollobas, Riordan)

If c ∈ (0; 1) and we make a random subgraph Gnm;c of the graph Gnm by deletingits [cn] first vertices, then for c 6 (m− 1)=(m+ 1), w.h.p. Gnm;c contains aconnected component of size � n, and for c > (m− 1)=(m+ 1), w.h.p. all theconnected components of Gnm;c are of size o(n).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36

Page 60: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Diameter and giant components

Theorem (Bollobas, Riordan)

If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .

Great, since for real values of n, we get lnnln lnn ∈ [5; 15].

Theorem (Bollobas, Riordan)

If p ∈ (0; 1) and we make a random subgraph Gnm;p of the graph Gnm by deletingits vertices independently each with probability p, then w.h.p. Gnm;p contains aconnected component of size � n.

Great, since we have the robustness property.

Theorem (Bollobas, Riordan)

If c ∈ (0; 1) and we make a random subgraph Gnm;c of the graph Gnm by deletingits [cn] first vertices, then for c 6 (m− 1)=(m+ 1), w.h.p. Gnm;c contains aconnected component of size � n, and for c > (m− 1)=(m+ 1), w.h.p. all theconnected components of Gnm;c are of size o(n).

Great, since we have the vulnerability to attacks on the hubs.Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36

Page 61: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Degree distribution

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36

Page 62: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Degree distribution

Theorem (Bollobas, Riordan, Spencer, Tusnady)

If d 6 n1=15, then w.h.p.

|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36

Page 63: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Degree distribution

Theorem (Bollobas, Riordan, Spencer, Tusnady)

If d 6 n1=15, then w.h.p.

|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:

Great, since we get a power-law.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36

Page 64: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Degree distribution

Theorem (Bollobas, Riordan, Spencer, Tusnady)

If d 6 n1=15, then w.h.p.

|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:

Great, since we get a power-law.

Not too great, since the exponent in the power-law is a bit different from theexperimental ones ( ∈ (2; 3)).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36

Page 65: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Degree distribution

Theorem (Bollobas, Riordan, Spencer, Tusnady)

If d 6 n1=15, then w.h.p.

|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:

Great, since we get a power-law.

Not too great, since the exponent in the power-law is a bit different from theexperimental ones ( ∈ (2; 3)).

Bad, since d 6 n1=15, which is non-realistic.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36

Page 66: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Degree distribution

Theorem (Bollobas, Riordan, Spencer, Tusnady)

If d 6 n1=15, then w.h.p.

|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:

Great, since we get a power-law.

Not too great, since the exponent in the power-law is a bit different from theexperimental ones ( ∈ (2; 3)).

Bad, since d 6 n1=15, which is non-realistic.

The last problem is completely solved: analog of B–R–S–T-theorem with anarbitrary d.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36

Page 67: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Degree distribution

Theorem (Bollobas, Riordan, Spencer, Tusnady)

If d 6 n1=15, then w.h.p.

|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:

Great, since we get a power-law.

Not too great, since the exponent in the power-law is a bit different from theexperimental ones ( ∈ (2; 3)).

Bad, since d 6 n1=15, which is non-realistic.

The last problem is completely solved: analog of B–R–S–T-theorem with anarbitrary d.Tune the model somehow to get other exponents in the power-law?

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36

Page 68: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 11 / 36

Page 69: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering

Let G = (V;E), v ∈ V . Let Nv be the set of neighbours of v in G. Letnv = |Nv|. If nv > 2, thenCv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 11 / 36

Page 70: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering

Let G = (V;E), v ∈ V . Let Nv be the set of neighbours of v in G. Letnv = |Nv|. If nv > 2, thenCv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :

Global clustering coefficient

The global clustering coefficient of G isT (G) =

∑v∈V C2nvCv∑v∈V C2nv :

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 11 / 36

Page 71: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering

Let G = (V;E), v ∈ V . Let Nv be the set of neighbours of v in G. Letnv = |Nv|. If nv > 2, thenCv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :

Global clustering coefficient

The global clustering coefficient of G isT (G) =

∑v∈V C2nvCv∑v∈V C2nv :

Let ](H;G) be the number of copies of a graph H in a graph G. ThenT (G) =3](K3; G)](P2; G)

;where K3 is a triangle and P2 is a 2-path.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 11 / 36

Page 72: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 12 / 36

Page 73: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering

Average local clustering coefficient

The average local clustering coefficient of G isC(G) =1

|V |∑v∈V Cv ;

where, again, Cv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 12 / 36

Page 74: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering

Average local clustering coefficient

The average local clustering coefficient of G isC(G) =1

|V |∑v∈V Cv ;

where, again, Cv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :

The quantities T (G) and C(G) are quite different.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 12 / 36

Page 75: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering

Average local clustering coefficient

The average local clustering coefficient of G isC(G) =1

|V |∑v∈V Cv ;

where, again, Cv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :

The quantities T (G) and C(G) are quite different.

Let G be K2;n−2 plus one edge between the vertices in the part of size 2. ThenC(G) ∼ 1, but T (G) = Θ(

1n).Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 12 / 36

Page 76: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering

Average local clustering coefficient

The average local clustering coefficient of G isC(G) =1

|V |∑v∈V Cv ;

where, again, Cv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :

The quantities T (G) and C(G) are quite different.

Let G be K2;n−2 plus one edge between the vertices in the part of size 2. ThenC(G) ∼ 1, but T (G) = Θ(

1n).Very important! However, many inaccuracies in the literature.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 12 / 36

Page 77: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: experiments and models

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36

Page 78: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: experiments and models

Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36

Page 79: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: experiments and models

Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36

Page 80: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: experiments and models

Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations. This is wrong, provided we do not take into account multiple edgesand loops (which is widely done).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36

Page 81: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: experiments and models

Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations. This is wrong, provided we do not take into account multiple edgesand loops (which is widely done).

Theorem (Ostroumova, Samosvat)

If in a sequence {Gn} of graphs, the degrees of the vertices follow a power lawwith exponent ∈ (2; 3), then T (Gn) → 0 as n → ∞.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36

Page 82: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: experiments and models

Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations. This is wrong, provided we do not take into account multiple edgesand loops (which is widely done).

Theorem (Ostroumova, Samosvat)

If in a sequence {Gn} of graphs, the degrees of the vertices follow a power lawwith exponent ∈ (2; 3), then T (Gn) → 0 as n → ∞.

Newman might be right, provided we do take into account multiple edges andloops.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36

Page 83: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: experiments and models

Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations. This is wrong, provided we do not take into account multiple edgesand loops (which is widely done).

Theorem (Ostroumova, Samosvat)

If in a sequence {Gn} of graphs, the degrees of the vertices follow a power lawwith exponent ∈ (2; 3), then T (Gn) → 0 as n → ∞.

Newman might be right, provided we do take into account multiple edges andloops.

Theorem (Ostroumova)

There exist sequences {Gn} of multigraphs with loops, whose degrees of thevertices follow a power law with exponent ∈ (2; 3) and, nevertheless,T (Gn) > const as n → ∞.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36

Page 84: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: experiments and models

Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations. This is wrong, provided we do not take into account multiple edgesand loops (which is widely done).

Theorem (Ostroumova, Samosvat)

If in a sequence {Gn} of graphs, the degrees of the vertices follow a power lawwith exponent ∈ (2; 3), then T (Gn) → 0 as n → ∞.

Newman might be right, provided we do take into account multiple edges andloops.

Theorem (Ostroumova)

There exist sequences {Gn} of multigraphs with loops, whose degrees of thevertices follow a power law with exponent ∈ (2; 3) and, nevertheless,T (Gn) > const as n → ∞.

However, what is T (G), if G has multiple edges and loops? Many differentdefinitions, and Newman does not say a word about this subtlety!

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36

Page 85: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: the B–R model

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 14 / 36

Page 86: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: the B–R model

Theorem (Bollobas, Riordan)

The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 14 / 36

Page 87: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: the B–R model

Theorem (Bollobas, Riordan)

The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .

To calculate the value E(T (Gnm)), one needs to know the number oftriangles and the number of 2-paths.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 14 / 36

Page 88: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: the B–R model

Theorem (Bollobas, Riordan)

The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .

To calculate the value E(T (Gnm)), one needs to know the number oftriangles and the number of 2-paths.

Recall that ](H;G) is the number of copies of a graph H in a graph G.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 14 / 36

Page 89: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: the B–R model

Theorem (Bollobas, Riordan)

The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .

To calculate the value E(T (Gnm)), one needs to know the number oftriangles and the number of 2-paths.

Recall that ](H;G) is the number of copies of a graph H in a graph G.

A general and nice result was proved by Ryabchenko and Samosvat.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 14 / 36

Page 90: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: the B–R model

Theorem (Bollobas, Riordan)

The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .

To calculate the value E(T (Gnm)), one needs to know the number oftriangles and the number of 2-paths.

Recall that ](H;G) is the number of copies of a graph H in a graph G.

A general and nice result was proved by Ryabchenko and Samosvat.

Theorem (Ryabchenko, Samosvat)

For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is

the number of vertices of degree k in H .

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 14 / 36

Page 91: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: comments on the theorems for the B–Rmodel

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36

Page 92: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: comments on the theorems for the B–Rmodel

Theorem 1 (Bollobas, Riordan)

The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36

Page 93: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: comments on the theorems for the B–Rmodel

Theorem 1 (Bollobas, Riordan)

The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .

Theorem 2 (Ryabchenko, Samosvat)

For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is

the number of vertices of degree k in H .

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36

Page 94: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: comments on the theorems for the B–Rmodel

Theorem 1 (Bollobas, Riordan)

The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .

Theorem 2 (Ryabchenko, Samosvat)

For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is

the number of vertices of degree k in H .

Theorem 2 agrees with Theorem 1: E(](K3; Gnm)) � ln3 n,E(](P2; Gnm)) � n lnn.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36

Page 95: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: comments on the theorems for the B–Rmodel

Theorem 1 (Bollobas, Riordan)

The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .

Theorem 2 (Ryabchenko, Samosvat)

For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is

the number of vertices of degree k in H .

Theorem 2 agrees with Theorem 1: E(](K3; Gnm)) � ln3 n,E(](P2; Gnm)) � n lnn.

By Theorem 2 the number of K4 etc. is asymptotically constant: bad.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36

Page 96: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: comments on the theorems for the B–Rmodel

Theorem 1 (Bollobas, Riordan)

The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .

Theorem 2 (Ryabchenko, Samosvat)

For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is

the number of vertices of degree k in H .

Theorem 2 agrees with Theorem 1: E(](K3; Gnm)) � ln3 n,E(](P2; Gnm)) � n lnn.

By Theorem 2 the number of K4 etc. is asymptotically constant: bad.

Unfortunately, C(Gnm) also approaches 0: even worse.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36

Page 97: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: comments on the theorems for the B–Rmodel

Theorem 1 (Bollobas, Riordan)

The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .

Theorem 2 (Ryabchenko, Samosvat)

For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is

the number of vertices of degree k in H .

Theorem 2 agrees with Theorem 1: E(](K3; Gnm)) � ln3 n,E(](P2; Gnm)) � n lnn.

By Theorem 2 the number of K4 etc. is asymptotically constant: bad.

Unfortunately, C(Gnm) also approaches 0: even worse.

And, once again, = 3, not ∈ (2; 3).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36

Page 98: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Clustering: comments on the theorems for the B–Rmodel

Theorem 1 (Bollobas, Riordan)

The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .

Theorem 2 (Ryabchenko, Samosvat)

For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is

the number of vertices of degree k in H .

Theorem 2 agrees with Theorem 1: E(](K3; Gnm)) � ln3 n,E(](P2; Gnm)) � n lnn.

By Theorem 2 the number of K4 etc. is asymptotically constant: bad.

Unfortunately, C(Gnm) also approaches 0: even worse.

And, once again, = 3, not ∈ (2; 3).

So let’s tune the model and try to calculate again the number of smallsubgraphs!

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36

Page 99: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 16 / 36

Page 100: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model

Which problems we had in the model of Bollobas–Riordan? Non-realistic exponentin the power-law, non-realistic clustering. Can solve the first problem! Thefollowing model is very close to the first one, but it has one important newparameter a > 0 called initial attractiveness of a vertex.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 16 / 36

Page 101: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model

Which problems we had in the model of Bollobas–Riordan? Non-realistic exponentin the power-law, non-realistic clustering. Can solve the first problem! Thefollowing model is very close to the first one, but it has one important newparameter a > 0 called initial attractiveness of a vertex.

Case m = 1H1a;1 — graph with one vertex v1 and one loop.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 16 / 36

Page 102: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model

Which problems we had in the model of Bollobas–Riordan? Non-realistic exponentin the power-law, non-realistic clustering. Can solve the first problem! Thefollowing model is very close to the first one, but it has one important newparameter a > 0 called initial attractiveness of a vertex.

Case m = 1H1a;1 — graph with one vertex v1 and one loop.

Given Hn−1a;1 we can make Hna;1 by adding vertex vn and an edge from it to avertex vi, picked from {v1; : : : ; vn} with probability

P(i = s) =

dHn−1a;1 (vs)+a−1

(a+1)n−1 1 6 s 6 n− 1a(a+1)n−1 s = n

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 16 / 36

Page 103: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model

Which problems we had in the model of Bollobas–Riordan? Non-realistic exponentin the power-law, non-realistic clustering. Can solve the first problem! Thefollowing model is very close to the first one, but it has one important newparameter a > 0 called initial attractiveness of a vertex.

Case m = 1H1a;1 — graph with one vertex v1 and one loop.

Given Hn−1a;1 we can make Hna;1 by adding vertex vn and an edge from it to avertex vi, picked from {v1; : : : ; vn} with probability

P(i = s) =

dHn−1a;1 (vs)+a−1

(a+1)n−1 1 6 s 6 n− 1a(a+1)n−1 s = n

For a = 1, we get the model of Bollobas–Riordan.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 16 / 36

Page 104: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model

Which problems we had in the model of Bollobas–Riordan? Non-realistic exponentin the power-law, non-realistic clustering. Can solve the first problem! Thefollowing model is very close to the first one, but it has one important newparameter a > 0 called initial attractiveness of a vertex.

Case m = 1H1a;1 — graph with one vertex v1 and one loop.

Given Hn−1a;1 we can make Hna;1 by adding vertex vn and an edge from it to avertex vi, picked from {v1; : : : ; vn} with probability

P(i = s) =

dHn−1a;1 (vs)+a−1

(a+1)n−1 1 6 s 6 n− 1a(a+1)n−1 s = n

For a = 1, we get the model of Bollobas–Riordan.

Case m > 1

Given Hmna;1 we can make Hna;m by gluing {v1; : : : ; vm} into v′1 , {vm+1; : : : ; v2m}into v′2, and so on.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 16 / 36

Page 105: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: degree distribution

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 17 / 36

Page 106: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: degree distribution

Theorem (Buckley, Osthus)

If d 6 n1=(100(a+1)), then w.h.p.

|{v ∈ Hna;m : deg v = d}|n ∼ const(a;m)da+2:

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 17 / 36

Page 107: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: degree distribution

Theorem (Buckley, Osthus)

If d 6 n1=(100(a+1)), then w.h.p.

|{v ∈ Hna;m : deg v = d}|n ∼ const(a;m)da+2:

Great, since now we can tune the model to get the expected exponent.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 17 / 36

Page 108: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: degree distribution

Theorem (Buckley, Osthus)

If d 6 n1=(100(a+1)), then w.h.p.

|{v ∈ Hna;m : deg v = d}|n ∼ const(a;m)da+2:

Great, since now we can tune the model to get the expected exponent.

Bad, since d 6 n1=(100(a+1)).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 17 / 36

Page 109: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: degree distribution

Theorem (Buckley, Osthus)

If d 6 n1=(100(a+1)), then w.h.p.

|{v ∈ Hna;m : deg v = d}|n ∼ const(a;m)da+2:

Great, since now we can tune the model to get the expected exponent.

Bad, since d 6 n1=(100(a+1)).

Completely solved.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 17 / 36

Page 110: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: degree distribution

Theorem (Buckley, Osthus)

If d 6 n1=(100(a+1)), then w.h.p.

|{v ∈ Hna;m : deg v = d}|n ∼ const(a;m)da+2:

Great, since now we can tune the model to get the expected exponent.

Bad, since d 6 n1=(100(a+1)).

Completely solved.

Many more further great features of the model!

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 17 / 36

Page 111: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: second degrees of vertices

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 18 / 36

Page 112: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: second degrees of vertices

Let d2(t) = |{{i; j} : i 6= t; j 6= t; {i; t} ∈ E(Hna;1); {i; j} ∈ E(Hna;1)}|:

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 18 / 36

Page 113: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: second degrees of vertices

Let d2(t) = |{{i; j} : i 6= t; j 6= t; {i; t} ∈ E(Hna;1); {i; j} ∈ E(Hna;1)}|:So we calculate the number of edges of Hna;1 that are joined with a neighbour of agiven vertex t.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 18 / 36

Page 114: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: second degrees of vertices

Let d2(t) = |{{i; j} : i 6= t; j 6= t; {i; t} ∈ E(Hna;1); {i; j} ∈ E(Hna;1)}|:So we calculate the number of edges of Hna;1 that are joined with a neighbour of agiven vertex t.Theorem (Ostroumova, Grechnikov, Kupavskiy, Tetali)

W.h.p.|{i = 1; : : : ; n : d2(i) = d}|n ∼ const(a)da+1

:Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 18 / 36

Page 115: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: second degrees of vertices

Let d2(t) = |{{i; j} : i 6= t; j 6= t; {i; t} ∈ E(Hna;1); {i; j} ∈ E(Hna;1)}|:So we calculate the number of edges of Hna;1 that are joined with a neighbour of agiven vertex t.Theorem (Ostroumova, Grechnikov, Kupavskiy, Tetali)

W.h.p.|{i = 1; : : : ; n : d2(i) = d}|n ∼ const(a)da+1

:Fits quite well to the real data.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 18 / 36

Page 116: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: the number of edgesbetween vertices of given degrees

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 19 / 36

Page 117: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: the number of edgesbetween vertices of given degrees

Let Xn(d1; d2) be the total number of edges between vertices of given degrees.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 19 / 36

Page 118: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: the number of edgesbetween vertices of given degrees

Let Xn(d1; d2) be the total number of edges between vertices of given degrees.Subtleties with the definition!

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 19 / 36

Page 119: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: the number of edgesbetween vertices of given degrees

Let Xn(d1; d2) be the total number of edges between vertices of given degrees.Subtleties with the definition!Very important, since

|{v ∈ Vn : deg v = d}| =1d∑d1

Xn(d1; d):Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 19 / 36

Page 120: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: the number of edgesbetween vertices of given degrees

Let Xn(d1; d2) be the total number of edges between vertices of given degrees.Subtleties with the definition!Very important, since

|{v ∈ Vn : deg v = d}| =1d∑d1

Xn(d1; d):Theorem (Grechnikov)

W.h.p. Xn(d1; d2)n ∼ c(a;m)

(

(d1 + d2)1−ad2

1d22

) :Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 19 / 36

Page 121: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: “power and glory”

Theorem (Grechnikov)

Let d1 > m and d2 > m. Let X = Xn(d1; d2). There exists a function cX(d1; d2)such that

EXn(d1; d2) = cX(d1; d2)n +Oa;m(1)

andcX(d1; d2) =Γ(d1 −m+ma)Γ(d2 −m+ma)

Γ(d1 −m+ma+ 2)Γ(d2 −m+ma + 2)×

× Γ(d1 + d2 − 2m+ 2ma+ 3)

Γ(d1 + d2 − 2m+ 2ma+ a + 2)ma(a+ 1)

Γ(ma+ a + 1)

Γ(ma) ×

×(

1 + �(d1; d2)(d1 −m+ma + 1)(d2 −m +ma+ 1)

(d1 + d2 − 2m+ 2ma+ 1)(d1 + d2 − 2m+ 2ma+ 2)

) ;where

−4 +2

1 +ma 6 �(d1; d2) 6 aΓ(ma+ 1)Γ(2ma+ a+ 3)

Γ(2ma+ 2)Γ(ma + a+ 2):

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 20 / 36

Page 122: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Bollobas–Riordan model: “power and glory”

Theorem (Grechnikov)

If d1 < k, d2 < k or d1 = d2 = k, then X = 0. If d1 > k; d2 > k andd1 + d2 > 2k + 1, then the expected value of X is

EX =k(k + 1)d1(d1 + 1)d2(d2 + 1)

(

1 −Ck+1

2k+2Cd1−kd1+d2−2kCd1+1d1+d2+2

)

(2kt+ 1)−

−k∑n=1

Cd1−nd1+d2−2nd1d2Cd1d1+d2

(

(2n)!n!(n+ 1)!

k + 1

2k + [n = k] (2k)!2(k − 1)!2

)

− [d1 = k] (k − 1)(k + 1)

2kd2(d2 + 1)− [d2 = k] (k − 1)(k + 1)

2kd1(d1 + 1)+Ok;d1;d2

(

1t) :Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 21 / 36

Page 123: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: one more evidence of itsquality

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 22 / 36

Page 124: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: one more evidence of itsquality

Assume that the web-graph is governed by the Buckley–Osthus model. What isthe most likely parameter a?

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 22 / 36

Page 125: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: one more evidence of itsquality

Assume that the web-graph is governed by the Buckley–Osthus model. What isthe most likely parameter a?We may try to find an optimal a by comparing the reality with the fact that thenumber of vertices of degree d is close to d−2−a (Grechnikov).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 22 / 36

Page 126: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: one more evidence of itsquality

Assume that the web-graph is governed by the Buckley–Osthus model. What isthe most likely parameter a?We may try to find an optimal a by comparing the reality with the fact that thenumber of vertices of degree d is close to d−2−a (Grechnikov).

We may try to find an optimal a by comparing the reality with the fact that thenumber of edges between vertices of degree d1 and d2 is close to(d1 + d2)

1−ad−21 d−2

2 (Grechnikov).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 22 / 36

Page 127: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: one more evidence of itsquality

Assume that the web-graph is governed by the Buckley–Osthus model. What isthe most likely parameter a?We may try to find an optimal a by comparing the reality with the fact that thenumber of vertices of degree d is close to d−2−a (Grechnikov).

We may try to find an optimal a by comparing the reality with the fact that thenumber of edges between vertices of degree d1 and d2 is close to(d1 + d2)

1−ad−21 d−2

2 (Grechnikov).

Assertion (Grechnikov, Zhukovskii, Vinogradov, Ostroumova, Pritykin,Gusev, Raigorodskii)

In both cases, the optimum is at the same a ≈ 0:27.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 22 / 36

Page 128: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: an application

We have seen that the model fits quite well the reality. How could we apply it?

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36

Page 129: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: an application

We have seen that the model fits quite well the reality. How could we apply it?

Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36

Page 130: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: an application

We have seen that the model fits quite well the reality. How could we apply it?

Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?

An algorithm

Calculate the total degrees of all the vertices of H (in the completeweb-graph).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36

Page 131: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: an application

We have seen that the model fits quite well the reality. How could we apply it?

Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?

An algorithm

Calculate the total degrees of all the vertices of H (in the completeweb-graph).

For each pair of vertices of H calculate the expected number of edgesbetween them using Step 1 and Grechnikov’s results.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36

Page 132: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: an application

We have seen that the model fits quite well the reality. How could we apply it?

Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?

An algorithm

Calculate the total degrees of all the vertices of H (in the completeweb-graph).

For each pair of vertices of H calculate the expected number of edgesbetween them using Step 1 and Grechnikov’s results.

Sum all the values found at Step 2.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36

Page 133: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: an application

We have seen that the model fits quite well the reality. How could we apply it?

Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?

An algorithm

Calculate the total degrees of all the vertices of H (in the completeweb-graph).

For each pair of vertices of H calculate the expected number of edgesbetween them using Step 1 and Grechnikov’s results.

Sum all the values found at Step 2.

Compare the result of Step 3 with the real number of edges in H .

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36

Page 134: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: an application

We have seen that the model fits quite well the reality. How could we apply it?

Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?

An algorithm

Calculate the total degrees of all the vertices of H (in the completeweb-graph).

For each pair of vertices of H calculate the expected number of edgesbetween them using Step 1 and Grechnikov’s results.

Sum all the values found at Step 2.

Compare the result of Step 3 with the real number of edges in H .

The difference between the real and the expected values can be used as a feature.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36

Page 135: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: clustering

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 24 / 36

Page 136: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: clustering

Theorem (Eggemann, Noble)

If a > 1, then E(](K3; Hna;m)) � lnn as n → ∞.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 24 / 36

Page 137: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: clustering

Theorem (Eggemann, Noble)

If a > 1, then E(](K3; Hna;m)) � lnn as n → ∞.

It’s remarkable that for a = 1 (i.e., for the B–R model) we had ln2 n instead oflnn.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 24 / 36

Page 138: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: clustering

Theorem (Eggemann, Noble)

If a > 1, then E(](K3; Hna;m)) � lnn as n → ∞.

It’s remarkable that for a = 1 (i.e., for the B–R model) we had ln2 n instead oflnn.

Theorem (Eggemann, Noble)

If a > 1, then E(](P2; Hna;m)) � n as n → ∞.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 24 / 36

Page 139: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: clustering

Theorem (Eggemann, Noble)

If a > 1, then E(](K3; Hna;m)) � lnn as n → ∞.

It’s remarkable that for a = 1 (i.e., for the B–R model) we had ln2 n instead oflnn.

Theorem (Eggemann, Noble)

If a > 1, then E(](P2; Hna;m)) � n as n → ∞.

Theorem (Eggemann, Noble)

If a > 1, then E(T (Hna;m)) � lnnn as n → ∞.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 24 / 36

Page 140: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: small subgraphs

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 25 / 36

Page 141: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: small subgraphs

Very recently Tilga has proved far-reaching generalizations and refinements of thetheorems by Bollobas–Riordan, Ryabchenko–Samosvat, and Eggemann–Noble.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 25 / 36

Page 142: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: small subgraphs

Very recently Tilga has proved far-reaching generalizations and refinements of thetheorems by Bollobas–Riordan, Ryabchenko–Samosvat, and Eggemann–Noble.

Theorem (Tilga)

For any a > 0 and any fixed graph F , the order of magnitude of E(](F;Hna;m)) isfound.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 25 / 36

Page 143: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: small subgraphs

Very recently Tilga has proved far-reaching generalizations and refinements of thetheorems by Bollobas–Riordan, Ryabchenko–Samosvat, and Eggemann–Noble.

Theorem (Tilga)

For any a > 0 and any fixed graph F , the order of magnitude of E(](F;Hna;m)) isfound.

The exact statement is quite cumbersome involving many parameters and cases.So we just give several most important and short enough corollaries.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 25 / 36

Page 144: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: paths and cliques

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 26 / 36

Page 145: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: paths and cliques

Theorem (Tilga)

Let m > 2 and a < 1, � = 1a+1 . Let Pl be a path of length l. Then for n → ∞,

E(](Pl; Hna;m)

)

=

{n(2�−1)k+1 · Θ(ml) for l = 2k;n(2�−1)k+1 · lnn · Θ(ml) for l = 2k + 1:

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 26 / 36

Page 146: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: paths and cliques

Theorem (Tilga)

Let m > 2 and a < 1, � = 1a+1 . Let Pl be a path of length l. Then for n → ∞,

E(](Pl; Hna;m)

)

=

{n(2�−1)k+1 · Θ(ml) for l = 2k;n(2�−1)k+1 · lnn · Θ(ml) for l = 2k + 1:Theorem (Tilga)

Let Kk be a clique of size k, where 4 6 k 6 m + 1. Then for n → ∞,

E(](Kk; Hna;m)

)

=

n1+(�−1)(k−1) · Θ(mC2k) for a < 1k−2 ;lnn · Θ(mC2k) for a = 1k−2 ;Θ(mC2k) for a > 1k−2 :

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 26 / 36

Page 147: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: paths and cliques

Theorem (Tilga)

Let m > 2 and a < 1, � = 1a+1 . Let Pl be a path of length l. Then for n → ∞,

E(](Pl; Hna;m)

)

=

{n(2�−1)k+1 · Θ(ml) for l = 2k;n(2�−1)k+1 · lnn · Θ(ml) for l = 2k + 1:Theorem (Tilga)

Let Kk be a clique of size k, where 4 6 k 6 m + 1. Then for n → ∞,

E(](Kk; Hna;m)

)

=

n1+(�−1)(k−1) · Θ(mC2k) for a < 1k−2 ;lnn · Θ(mC2k) for a = 1k−2 ;Θ(mC2k) for a > 1k−2 :

For example, if a = 13 (close to 0:27), then the number of K5 is about logn, and

the number of K4 is about 4√n. Much more realistic than in the B–R model!

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 26 / 36

Page 148: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: cycles and bicliques

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 27 / 36

Page 149: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: cycles and bicliques

Theorem (Tilga)

Let Cl be a cycle of length l. Then for n → ∞,

E(](Cl; Hna;m)

)

=

{n(2�−1)k · Θ(ml) for l = 2k;n(2�−1)k · lnn · Θ(ml) for l = 2k + 1:

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 27 / 36

Page 150: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: cycles and bicliques

Theorem (Tilga)

Let Cl be a cycle of length l. Then for n → ∞,

E(](Cl; Hna;m)

)

=

{n(2�−1)k · Θ(ml) for l = 2k;n(2�−1)k · lnn · Θ(ml) for l = 2k + 1:Theorem (Tilga)

Let Kk;l be a biclique with 2 6 l 6 min{k;m}. Then for n → ∞,

E(](Kk;l; Hna;m)

)

=

nk(1+(�−1)l) · Θ(mkl) for a < 1l−1 ;(lnn)k · Θ(mkl) for a = 1l−1 ;Θ(mkl) for a > 1l−1 :

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 27 / 36

Page 151: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Buckley–Osthus model: cycles and bicliques

Theorem (Tilga)

Let Cl be a cycle of length l. Then for n → ∞,

E(](Cl; Hna;m)

)

=

{n(2�−1)k · Θ(ml) for l = 2k;n(2�−1)k · lnn · Θ(ml) for l = 2k + 1:Theorem (Tilga)

Let Kk;l be a biclique with 2 6 l 6 min{k;m}. Then for n → ∞,

E(](Kk;l; Hna;m)

)

=

nk(1+(�−1)l) · Θ(mkl) for a < 1l−1 ;(lnn)k · Θ(mkl) for a = 1l−1 ;Θ(mkl) for a > 1l−1 :

The number of bicliques shows how many communities are formed. For example,if a = 1

3 (close to 0:27), then there are many Kk;4 and a lot of Kk;3, which wasimpossible in the B–R model (there are no vertices of degree < 3 in such graphs).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 27 / 36

Page 152: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Classical “Google” PageRankGn = (Vn; En) — web-graph.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 28 / 36

Page 153: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Classical “Google” PageRankGn = (Vn; En) — web-graph.Classical “Google” PageRank is the solution �(n) = (�i(n))i=1;:::;|Vn| to thesystem �i(n) = c∑j→i �j(n)

outdeg j +1 − c|Vn| ; i = 1; : : : ; |Vn|:

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 28 / 36

Page 154: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Classical “Google” PageRankGn = (Vn; En) — web-graph.Classical “Google” PageRank is the solution �(n) = (�i(n))i=1;:::;|Vn| to thesystem �i(n) = c∑j→i �j(n)

outdeg j +1 − c|Vn| ; i = 1; : : : ; |Vn|:

The coordinates �i(n) follow a power-law similarly to the degrees (with anotherparameter of the law).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 28 / 36

Page 155: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Classical “Google” PageRankGn = (Vn; En) — web-graph.Classical “Google” PageRank is the solution �(n) = (�i(n))i=1;:::;|Vn| to thesystem �i(n) = c∑j→i �j(n)

outdeg j +1 − c|Vn| ; i = 1; : : : ; |Vn|:

The coordinates �i(n) follow a power-law similarly to the degrees (with anotherparameter of the law).

The same is true in a Barabasi–Albert model.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 28 / 36

Page 156: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Classical “Google” PageRankGn = (Vn; En) — web-graph.Classical “Google” PageRank is the solution �(n) = (�i(n))i=1;:::;|Vn| to thesystem �i(n) = c∑j→i �j(n)

outdeg j +1 − c|Vn| ; i = 1; : : : ; |Vn|:

The coordinates �i(n) follow a power-law similarly to the degrees (with anotherparameter of the law).

The same is true in a Barabasi–Albert model.

Theorem (Avrachenkov)

For i > 0

E�i(n) =1 − c1 + n ( 1

1 + c +cΓ (i+ 1

2

)

Γ(n + c

2 + 1)

(1 + c)Γ (i+ c2 + 1

)

Γ(n + 1

2

)

)

≈ 1 − c1 + n ( 1

1 + c +c

1 + c (i +1

2

)− 1+c2(n +

1

2

)1+c

2

) :Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 28 / 36

Page 157: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 29 / 36

Page 158: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).

To each i ∈ V we assign an l-dimensional vector vi of features, and to each edgei → j in E we assign an m-dimensional vector ei;j of features.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 29 / 36

Page 159: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).

To each i ∈ V we assign an l-dimensional vector vi of features, and to each edgei → j in E we assign an m-dimensional vector ei;j of features.

We have three parameters to be adjusted optimally.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 29 / 36

Page 160: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).

To each i ∈ V we assign an l-dimensional vector vi of features, and to each edgei → j in E we assign an m-dimensional vector ei;j of features.

We have three parameters to be adjusted optimally.

Two main parameters

An l-dimensional vector !, which gives us a weight of a vertex i ∈ V : theweight is f(!; i) = (!;vi) — scalar product.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 29 / 36

Page 161: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).

To each i ∈ V we assign an l-dimensional vector vi of features, and to each edgei → j in E we assign an m-dimensional vector ei;j of features.

We have three parameters to be adjusted optimally.

Two main parameters

An l-dimensional vector !, which gives us a weight of a vertex i ∈ V : theweight is f(!; i) = (!;vi) — scalar product.

An m-dimensional vector ', which gives us a weight of an edge i → j in E:the weight is g('; i → j) = ('; ei;j).

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 29 / 36

Page 162: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).

To each i ∈ V we assign an l-dimensional vector vi of features, and to each edgei → j in E we assign an m-dimensional vector ei;j of features.

We have three parameters to be adjusted optimally.

Two main parameters

An l-dimensional vector !, which gives us a weight of a vertex i ∈ V : theweight is f(!; i) = (!;vi) — scalar product.

An m-dimensional vector ', which gives us a weight of an edge i → j in E:the weight is g('; i → j) = ('; ei;j).

The third parameter c ∈ (0; 1) will appear on the next slide.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 29 / 36

Page 163: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Weighted “Yandex” PageRank

Main definition

Weighted PageRank is a vector � — the solution to a system � = A�, where A isa matrix with entriesAi;j = c f(!; j)

∑k∈V f(!; k) + (1 − c) g('; i → j)∑h: i→h g('; i → h)

:

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 30 / 36

Page 164: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Weighted “Yandex” PageRank

Main definition

Weighted PageRank is a vector � — the solution to a system � = A�, where A isa matrix with entriesAi;j = c f(!; j)

∑k∈V f(!; k) + (1 − c) g('; i → j)∑h: i→h g('; i → h)

:Main problem

Let S be a measure of the difference between our PageRank �(!; '; c) and someestimates assigned to each document (to each i ∈ V ) according to a given searchquery (e.g., S is the standard deviation). Find

(!0; '0; c0) = argmin!;'S(�(!; '; c); vector of estimates):Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 30 / 36

Page 165: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

Weighted “Yandex” PageRank

Main definition

Weighted PageRank is a vector � — the solution to a system � = A�, where A isa matrix with entriesAi;j = c f(!; j)

∑k∈V f(!; k) + (1 − c) g('; i → j)∑h: i→h g('; i → h)

:Main problem

Let S be a measure of the difference between our PageRank �(!; '; c) and someestimates assigned to each document (to each i ∈ V ) according to a given searchquery (e.g., S is the standard deviation). Find

(!0; '0; c0) = argmin!;'S(�(!; '; c); vector of estimates):Work in progress. Recent advances are made in cooperation with Nesterov and hisresearch group.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 30 / 36

Page 166: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 31 / 36

Page 167: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models

Buckley–Osthus is not ideal for clustering. Can one do anything to improve it?

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 31 / 36

Page 168: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models

Buckley–Osthus is not ideal for clustering. Can one do anything to improve it?

Many multiparametric models.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 31 / 36

Page 169: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models

Buckley–Osthus is not ideal for clustering. Can one do anything to improve it?

Many multiparametric models.

A break-through is due to Ryabchenko, Samosvat, Ostroumova.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 31 / 36

Page 170: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models

Buckley–Osthus is not ideal for clustering. Can one do anything to improve it?

Many multiparametric models.

A break-through is due to Ryabchenko, Samosvat, Ostroumova.

The PA-class

Let Gnm (n > n0) be a graph with n vertices {1; : : : ; n} and mn edges obtained asa result of the following random graph process. We start at the time n0 from anarbitrary graph Gn0m with n0 vertices and mn0 edges. On the (n+ 1)-th step(n > n0), we make the graph Gn+1m from Gnm by adding a new vertex n + 1 andm edges connecting this vertex to some m vertices from the set {1; : : : ; n; n+ 1}.Denote by dnv the degree of a vertex v in Gnm. Assume that for some constants Aand B the following conditions are satisfied:

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 31 / 36

Page 171: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models: continuation

The PA-class conditions

P(dn+1v = dnv | Gnm) = 1 −Adnvn −B 1n +O((dnv )2n2

) ; 1 6 v 6 n ; (1)

P(dn+1v = dnv + 1 | Gnm) = Adnvn +B 1n +O((dnv )

2n2

) ; 1 6 v 6 n ; (2)

P(dn+1v = dnv + j | Gnm) = O((dnv )

2n2

) ; 2 6 j 6 m; 1 6 v 6 n ; (3)

P(dn+1n+1 = m+ j) = O( 1n) ; 1 6 j 6 m : (4)

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 32 / 36

Page 172: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models: continuation

The PA-class conditions

P(dn+1v = dnv | Gnm) = 1 −Adnvn −B 1n +O((dnv )2n2

) ; 1 6 v 6 n ; (1)

P(dn+1v = dnv + 1 | Gnm) = Adnvn +B 1n +O((dnv )

2n2

) ; 1 6 v 6 n ; (2)

P(dn+1v = dnv + j | Gnm) = O((dnv )

2n2

) ; 2 6 j 6 m; 1 6 v 6 n ; (3)

P(dn+1n+1 = m+ j) = O( 1n) ; 1 6 j 6 m : (4)

For A = 1=2, B = 0, we get Bollobas–Riordan.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 32 / 36

Page 173: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models: continuation

The PA-class conditions

P(dn+1v = dnv | Gnm) = 1 −Adnvn −B 1n +O((dnv )2n2

) ; 1 6 v 6 n ; (1)

P(dn+1v = dnv + 1 | Gnm) = Adnvn +B 1n +O((dnv )

2n2

) ; 1 6 v 6 n ; (2)

P(dn+1v = dnv + j | Gnm) = O((dnv )

2n2

) ; 2 6 j 6 m; 1 6 v 6 n ; (3)

P(dn+1n+1 = m+ j) = O( 1n) ; 1 6 j 6 m : (4)

For A = 1=2, B = 0, we get Bollobas–Riordan.

For A = 1=(1 + a), B = ma=(1 + a), we get Buckley–Osthus.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 32 / 36

Page 174: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models: results and questions

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 33 / 36

Page 175: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models: results and questions

Theorem (Ostroumova, Ryabchenko, Samosvat)

W.h.p.|{v ∈ Gnm : deg v = d}|n ∼ const(A;B;m)d1+1=A :

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 33 / 36

Page 176: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models: results and questions

Theorem (Ostroumova, Ryabchenko, Samosvat)

W.h.p.|{v ∈ Gnm : deg v = d}|n ∼ const(A;B;m)d1+1=A :

Theorem (Ostroumova, Ryabchenko, Samosvat)

If 2A < 1 then w.h.p. T (n) ∼ c(A;B;m) ;If 2A = 1 then w.h.p. T (n) ∼ c′(A;B;m)

lnn ;If 2A > 1 then for any " > 0 w.h.p. n1−2A−" 6 T (n) 6 n1−2A+" :

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 33 / 36

Page 177: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models: results and questions

Theorem (Ostroumova, Ryabchenko, Samosvat)

W.h.p.|{v ∈ Gnm : deg v = d}|n ∼ const(A;B;m)d1+1=A :

Theorem (Ostroumova, Ryabchenko, Samosvat)

If 2A < 1 then w.h.p. T (n) ∼ c(A;B;m) ;If 2A = 1 then w.h.p. T (n) ∼ c′(A;B;m)

lnn ;If 2A > 1 then for any " > 0 w.h.p. n1−2A−" 6 T (n) 6 n1−2A+" :

Great, since in the first case, we have a constant clustering together withpower-law!

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 33 / 36

Page 178: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models: results and questions

Theorem (Ostroumova, Ryabchenko, Samosvat)

W.h.p.|{v ∈ Gnm : deg v = d}|n ∼ const(A;B;m)d1+1=A :

Theorem (Ostroumova, Ryabchenko, Samosvat)

If 2A < 1 then w.h.p. T (n) ∼ c(A;B;m) ;If 2A = 1 then w.h.p. T (n) ∼ c′(A;B;m)

lnn ;If 2A > 1 then for any " > 0 w.h.p. n1−2A−" 6 T (n) 6 n1−2A+" :

Great, since in the first case, we have a constant clustering together withpower-law!

Open questions and questions under solution

(Ostroumova, Krot) The local clustering coefficient is constant in n in thenew models and is proportional to 1=d in the degree d.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 33 / 36

Page 179: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

A new general class of models: results and questions

Theorem (Ostroumova, Ryabchenko, Samosvat)

W.h.p.|{v ∈ Gnm : deg v = d}|n ∼ const(A;B;m)d1+1=A :

Theorem (Ostroumova, Ryabchenko, Samosvat)

If 2A < 1 then w.h.p. T (n) ∼ c(A;B;m) ;If 2A = 1 then w.h.p. T (n) ∼ c′(A;B;m)

lnn ;If 2A > 1 then for any " > 0 w.h.p. n1−2A−" 6 T (n) 6 n1−2A+" :

Great, since in the first case, we have a constant clustering together withpower-law!

Open questions and questions under solution

(Ostroumova, Krot) The local clustering coefficient is constant in n in thenew models and is proportional to 1=d in the degree d.What’s with “Google”, “Yandex” and other PageRanks in the new models?

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 33 / 36

Page 180: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

“Fresh” models

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 34 / 36

Page 181: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

“Fresh” models

Another problem of the preferential attachment models: the older is a page, thegreater is the probability that it will gain more and more citations. That’s notcompletely realistic, for we have a large part of the Internet containing news andother media.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 34 / 36

Page 182: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

“Fresh” models

Another problem of the preferential attachment models: the older is a page, thegreater is the probability that it will gain more and more citations. That’s notcompletely realistic, for we have a large part of the Internet containing news andother media.

Need to use an age factor for each page.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 34 / 36

Page 183: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

“Fresh” models

Another problem of the preferential attachment models: the older is a page, thegreater is the probability that it will gain more and more citations. That’s notcompletely realistic, for we have a large part of the Internet containing news andother media.

Need to use an age factor for each page.

Samosvat and Ostroumova proposed a class of such models.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 34 / 36

Page 184: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

“Fresh” models

Another problem of the preferential attachment models: the older is a page, thegreater is the probability that it will gain more and more citations. That’s notcompletely realistic, for we have a large part of the Internet containing news andother media.

Need to use an age factor for each page.

Samosvat and Ostroumova proposed a class of such models.

Let’s construct a sequence of random graphs {Gn}. For that, consider an integerconstant m (vertex outdegree), an integer function N(n), and a sequence ofmutually independent random variables �1; �2; : : : with some given distributiontaking positive values.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 34 / 36

Page 185: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

“Fresh” models

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 35 / 36

Page 186: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

“Fresh” models

First, take 2 vertices and 1 edge between them (graph Gn2 ). The first 2 vertices

have “inherent qualities” q(1) = �1, q(2) = �2. At the t+ 1-th step(2 6 t 6 n− 1) one vertex and m edges are added to Gnt :q(t+ 1) = �t+1; P(t + 1 → i) =

attrt(i)∑tj=1 attrt(j) ;

where

attrt(i) = (1 or q(i)) · (1 or dt(i)) · (1 or I[i > t−N(n)] or e− t−iN(n)

)

and dt(i) is the degree of the vertex i in Gnt .

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 35 / 36

Page 187: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

“Fresh” models

First, take 2 vertices and 1 edge between them (graph Gn2 ). The first 2 vertices

have “inherent qualities” q(1) = �1, q(2) = �2. At the t+ 1-th step(2 6 t 6 n− 1) one vertex and m edges are added to Gnt :q(t+ 1) = �t+1; P(t + 1 → i) =

attrt(i)∑tj=1 attrt(j) ;

where

attrt(i) = (1 or q(i)) · (1 or dt(i)) · (1 or I[i > t−N(n)] or e− t−iN(n)

)

and dt(i) is the degree of the vertex i in Gnt .

Let Gn = Gnn.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 35 / 36

Page 188: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

“Fresh” models

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 36 / 36

Page 189: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

“Fresh” models

Theorem (Samosvat, Ostroumova)

The most realistic case is when attrt(i) = q(i) · e− t−iN(n) and �i have a power law(say, Pareto) distribution. In this case, not only w.h.p. the fraction of the numberof vertices with a given degree follows a power law with a tunable exponent, butalso the quantity e(T ), which is the fraction of edges that connect vertices withage difference greater than T , decreases exponentially, right as in the reality.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 36 / 36

Page 190: romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · The main objects Real-world web-graph G = (V ; E), where V — set of web-pages, set of web-sites, set of web-hosts, and

“Fresh” models

Theorem (Samosvat, Ostroumova)

The most realistic case is when attrt(i) = q(i) · e− t−iN(n) and �i have a power law(say, Pareto) distribution. In this case, not only w.h.p. the fraction of the numberof vertices with a given degree follows a power law with a tunable exponent, butalso the quantity e(T ), which is the fraction of edges that connect vertices withage difference greater than T , decreases exponentially, right as in the reality.

The model is used in Yandex to substantially improve the crawling algorithm.

Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 36 / 36