romip.ruromip.ru/russir2015/pdf/raigorodsky-present2.pdf · the main objects real-world web-graph g...
TRANSCRIPT
Models of Random Graphs and their Applicationsto the Web-graph analysis
Andrei Raigorodskii
Lomonosov Moscow State University,Moscow Institute of Physics and Technology,
Yandex,Moscow, Russia
RUSSIR, 25 - 29 August, 2015
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 1 / 36
The main objects
Real-world web-graphG = (V;E), where V —
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36
The main objects
Real-world web-graphG = (V;E), where V —
set of web-pages,
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36
The main objects
Real-world web-graphG = (V;E), where V —
set of web-pages,
set of web-sites,
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36
The main objects
Real-world web-graphG = (V;E), where V —
set of web-pages,
set of web-sites,
set of web-hosts,
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36
The main objects
Real-world web-graphG = (V;E), where V —
set of web-pages,
set of web-sites,
set of web-hosts,
and E — the set of all hyperlinks between the vertices (nodes).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36
The main objects
Real-world web-graphG = (V;E), where V —
set of web-pages,
set of web-sites,
set of web-hosts,
and E — the set of all hyperlinks between the vertices (nodes).Sometimes multiple edges are identified. Sometimes multiple edges and evenloops are allowed.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36
The main objects
Real-world web-graphG = (V;E), where V —
set of web-pages,
set of web-sites,
set of web-hosts,
and E — the set of all hyperlinks between the vertices (nodes).Sometimes multiple edges are identified. Sometimes multiple edges and evenloops are allowed.
Why do we need a model?
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36
The main objects
Real-world web-graphG = (V;E), where V —
set of web-pages,
set of web-sites,
set of web-hosts,
and E — the set of all hyperlinks between the vertices (nodes).Sometimes multiple edges are identified. Sometimes multiple edges and evenloops are allowed.
Why do we need a model?Many reasons!
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36
The main objects
Real-world web-graphG = (V;E), where V —
set of web-pages,
set of web-sites,
set of web-hosts,
and E — the set of all hyperlinks between the vertices (nodes).Sometimes multiple edges are identified. Sometimes multiple edges and evenloops are allowed.
Why do we need a model?Many reasons!
Adjust algorithms;
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36
The main objects
Real-world web-graphG = (V;E), where V —
set of web-pages,
set of web-sites,
set of web-hosts,
and E — the set of all hyperlinks between the vertices (nodes).Sometimes multiple edges are identified. Sometimes multiple edges and evenloops are allowed.
Why do we need a model?Many reasons!
Adjust algorithms;
Find unexpected structures (news, spam, etc.) using classifiers learnt onsome features coming from models.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 2 / 36
How to construct a model?
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 3 / 36
How to construct a model?
Statistics
First, find some statistical properties of web-graphs that would describe mostaccurately the real-world structures.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 3 / 36
How to construct a model?
Statistics
First, find some statistical properties of web-graphs that would describe mostaccurately the real-world structures.
Probability Theory
Then, take a random element G which takes values in a set of graphs on nvertices and has such a distribution that w.h.p. (with high probability, i.e., withprobability approaching 1 as n → ∞) G has the same properties as the onesmentioned above.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 3 / 36
Some properties
Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36
Some properties
Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.
Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36
Some properties
Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.
Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.
Web-graphs have a unique “giant” connected component.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36
Some properties
Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.
Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.
Web-graphs have a unique “giant” connected component.
Every two vertices in the giant component are connected by a path of shortlength (5–6, 15–20 depending on what we mean by web-graph): diamG ≈ 6(the rule of 6 handshakes).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36
Some properties
Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.
Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.
Web-graphs have a unique “giant” connected component.
Every two vertices in the giant component are connected by a path of shortlength (5–6, 15–20 depending on what we mean by web-graph): diamG ≈ 6(the rule of 6 handshakes).
Web-graphs are robust when random vertices are destroyed (a giantcomponent survives).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36
Some properties
Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.
Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.
Web-graphs have a unique “giant” connected component.
Every two vertices in the giant component are connected by a path of shortlength (5–6, 15–20 depending on what we mean by web-graph): diamG ≈ 6(the rule of 6 handshakes).
Web-graphs are robust when random vertices are destroyed (a giantcomponent survives).
Web-graphs are vulnerable to attacks onto hubs (many small componentsappear after a threshold is surpassed).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36
Some properties
Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.
Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.
Web-graphs have a unique “giant” connected component.
Every two vertices in the giant component are connected by a path of shortlength (5–6, 15–20 depending on what we mean by web-graph): diamG ≈ 6(the rule of 6 handshakes).
Web-graphs are robust when random vertices are destroyed (a giantcomponent survives).
Web-graphs are vulnerable to attacks onto hubs (many small componentsappear after a threshold is surpassed).
Many triangles — high clusternig.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36
Some properties
Barabasi–Albert, Watts–Strogatz, Newman, and many others in 90s–00s.
Web-graphs are sparse, i.e., their numbers of edges (links) are proportional totheir numbers of vertices.
Web-graphs have a unique “giant” connected component.
Every two vertices in the giant component are connected by a path of shortlength (5–6, 15–20 depending on what we mean by web-graph): diamG ≈ 6(the rule of 6 handshakes).
Web-graphs are robust when random vertices are destroyed (a giantcomponent survives).
Web-graphs are vulnerable to attacks onto hubs (many small componentsappear after a threshold is surpassed).
Many triangles — high clusternig.
The degree distribution is close to a power-law:
|{v ∈ V : deg v = d}|n ∼ constd ;where ∈ (2; 3) depends on what we mean by web-graph.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 4 / 36
Erdos–Renyi model
Fix n ∈ N.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36
Erdos–Renyi model
Fix n ∈ N.
Let Vn = {1; 2; : : : ; n}, m = C2n.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36
Erdos–Renyi model
Fix n ∈ N.
Let Vn = {1; 2; : : : ; n}, m = C2n.
Fix p ∈ [0; 1].
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36
Erdos–Renyi model
Fix n ∈ N.
Let Vn = {1; 2; : : : ; n}, m = C2n.
Fix p ∈ [0; 1].
Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36
Erdos–Renyi model
Fix n ∈ N.
Let Vn = {1; 2; : : : ; n}, m = C2n.
Fix p ∈ [0; 1].
Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.
Take these edges independently, each with probability p.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36
Erdos–Renyi model
Fix n ∈ N.
Let Vn = {1; 2; : : : ; n}, m = C2n.
Fix p ∈ [0; 1].
Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.
Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36
Erdos–Renyi model
Fix n ∈ N.
Let Vn = {1; 2; : : : ; n}, m = C2n.
Fix p ∈ [0; 1].
Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.
Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).Now let n → ∞. We may vary p depending on n.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36
Erdos–Renyi model
Fix n ∈ N.
Let Vn = {1; 2; : : : ; n}, m = C2n.
Fix p ∈ [0; 1].
Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.
Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).Now let n → ∞. We may vary p depending on n.
Of course intuitively G(n; p) does not reflect the reality. At least, it seemsunnatural to take the edges independently.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36
Erdos–Renyi model
Fix n ∈ N.
Let Vn = {1; 2; : : : ; n}, m = C2n.
Fix p ∈ [0; 1].
Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.
Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).Now let n → ∞. We may vary p depending on n.
Of course intuitively G(n; p) does not reflect the reality. At least, it seemsunnatural to take the edges independently.
However, if p(n) = cn , then the expected number of edges in G(n; p) is cn−12 ,
so that we get sparsity. And the diameter is known to be small enough.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36
Erdos–Renyi model
Fix n ∈ N.
Let Vn = {1; 2; : : : ; n}, m = C2n.
Fix p ∈ [0; 1].
Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.
Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).Now let n → ∞. We may vary p depending on n.
Of course intuitively G(n; p) does not reflect the reality. At least, it seemsunnatural to take the edges independently.
However, if p(n) = cn , then the expected number of edges in G(n; p) is cn−12 ,
so that we get sparsity. And the diameter is known to be small enough.
But in this case the distribution of the degrees is binomial with parametersn− 1; p, which is close, in turn, to Poisson with the expectation c. Not apower law at all!
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36
Erdos–Renyi model
Fix n ∈ N.
Let Vn = {1; 2; : : : ; n}, m = C2n.
Fix p ∈ [0; 1].
Consider all the edges e1; : : : ; em of the complete graph with vertices Vn.
Take these edges independently, each with probability p.Denote the resulting random graph by G(n; p).Now let n → ∞. We may vary p depending on n.
Of course intuitively G(n; p) does not reflect the reality. At least, it seemsunnatural to take the edges independently.
However, if p(n) = cn , then the expected number of edges in G(n; p) is cn−12 ,
so that we get sparsity. And the diameter is known to be small enough.
But in this case the distribution of the degrees is binomial with parametersn− 1; p, which is close, in turn, to Poisson with the expectation c. Not apower law at all!
Would be great to explain somehow the power law phenomenon, not loosingthe sparsity and the diameter property.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 5 / 36
Barabasi–Albert “model”
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36
Barabasi–Albert “model”
Fix an initial graph G0.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36
Barabasi–Albert “model”
Fix an initial graph G0.
Add new vertices (web sites) one by one.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36
Barabasi–Albert “model”
Fix an initial graph G0.
Add new vertices (web sites) one by one.
Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36
Barabasi–Albert “model”
Fix an initial graph G0.
Add new vertices (web sites) one by one.
Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.
This is called “preferential attachment”: a new site prefers to send a link tothe existing sites according to their current popularity.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36
Barabasi–Albert “model”
Fix an initial graph G0.
Add new vertices (web sites) one by one.
Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.
This is called “preferential attachment”: a new site prefers to send a link tothe existing sites according to their current popularity.
Obviously we have sparsity: n vertices and mn edges.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36
Barabasi–Albert “model”
Fix an initial graph G0.
Add new vertices (web sites) one by one.
Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.
This is called “preferential attachment”: a new site prefers to send a link tothe existing sites according to their current popularity.
Obviously we have sparsity: n vertices and mn edges.
Barabasi and Albert claim that this model has, w.h.p., small diameter and —the most important — follows a power law.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36
Barabasi–Albert “model”
Fix an initial graph G0.
Add new vertices (web sites) one by one.
Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.
This is called “preferential attachment”: a new site prefers to send a link tothe existing sites according to their current popularity.
Obviously we have sparsity: n vertices and mn edges.
Barabasi and Albert claim that this model has, w.h.p., small diameter and —the most important — follows a power law.
Another pair of mathematicians...
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36
Barabasi–Albert “model”
Fix an initial graph G0.
Add new vertices (web sites) one by one.
Any new vertex sends m edges to some previous vertices with probabilitiesproportional to the degrees of these vertices.
This is called “preferential attachment”: a new site prefers to send a link tothe existing sites according to their current popularity.
Obviously we have sparsity: n vertices and mn edges.
Barabasi and Albert claim that this model has, w.h.p., small diameter and —the most important — follows a power law.
Another pair of mathematicians... Bollobas and Riordan criticised — not theidea! — its realization.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 6 / 36
Barabasi–Albert “model”
First, of course the random graph depends greatly on the initial graph G0.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 7 / 36
Barabasi–Albert “model”
First, of course the random graph depends greatly on the initial graph G0.
Second, Barabasi and Albert do not say any word on how is distributed theset of m targets of a new vertex among the old ones.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 7 / 36
Barabasi–Albert “model”
First, of course the random graph depends greatly on the initial graph G0.
Second, Barabasi and Albert do not say any word on how is distributed theset of m targets of a new vertex among the old ones.
Finally, B and R prove that, roughly speaking, one can organize the process(take a G0 and a distribution of targets) in such a way that one gets, w.h.p.,an arbitrary number of triangles in the resulting random graph.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 7 / 36
Barabasi–Albert “model”
First, of course the random graph depends greatly on the initial graph G0.
Second, Barabasi and Albert do not say any word on how is distributed theset of m targets of a new vertex among the old ones.
Finally, B and R prove that, roughly speaking, one can organize the process(take a G0 and a distribution of targets) in such a way that one gets, w.h.p.,an arbitrary number of triangles in the resulting random graph.
In principle, it’s good to have a class of models instead of just one model.The problem is that one can hardly understand from the papers by B and Afor which concrete model from this class the results are stated. Moreover,sometimes the results of different papers are contradictory (cannot be truesimultaneously).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 7 / 36
Bollobas–Riordan model
Construct a random graph Gnm with n vertices and mn edges, m ∈ N.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36
Bollobas–Riordan model
Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36
Bollobas–Riordan model
Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.
Case m = 1G11 — graph with one vertex v1 and one loop.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36
Bollobas–Riordan model
Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.
Case m = 1G11 — graph with one vertex v1 and one loop.
Given Gn−11 we can make Gn
1 by adding vertex vn and an edge from it to a vertexvi, picked from {v1; : : : ; vn} with probability
P(i = s) =
{ dGn−11
(vs)2n−1 1 6 s 6 n− 11
2n−1 s = nAndrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36
Bollobas–Riordan model
Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.
Case m = 1G11 — graph with one vertex v1 and one loop.
Given Gn−11 we can make Gn
1 by adding vertex vn and an edge from it to a vertexvi, picked from {v1; : : : ; vn} with probability
P(i = s) =
{ dGn−11
(vs)2n−1 1 6 s 6 n− 11
2n−1 s = nPreferential attachment!
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36
Bollobas–Riordan model
Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.
Case m = 1G11 — graph with one vertex v1 and one loop.
Given Gn−11 we can make Gn
1 by adding vertex vn and an edge from it to a vertexvi, picked from {v1; : : : ; vn} with probability
P(i = s) =
{ dGn−11
(vs)2n−1 1 6 s 6 n− 11
2n−1 s = nPreferential attachment!
Case m > 1
Given Gmn1 we can make Gnm by gluing {v1; : : : ; vm} into v′1 , {vm+1; : : : ; v2m}
into v′2, and so on.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36
Bollobas–Riordan model
Construct a random graph Gnm with n vertices and mn edges, m ∈ N.Let dG(v) be the degree of a vertex v in a graph G.
Case m = 1G11 — graph with one vertex v1 and one loop.
Given Gn−11 we can make Gn
1 by adding vertex vn and an edge from it to a vertexvi, picked from {v1; : : : ; vn} with probability
P(i = s) =
{ dGn−11
(vs)2n−1 1 6 s 6 n− 11
2n−1 s = nPreferential attachment!
Case m > 1
Given Gmn1 we can make Gnm by gluing {v1; : : : ; vm} into v′1 , {vm+1; : : : ; v2m}
into v′2, and so on.
The random graph Gnm is certainly sparse. What’s about other properties?
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 8 / 36
Diameter and giant components
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36
Diameter and giant components
Theorem (Bollobas, Riordan)
If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36
Diameter and giant components
Theorem (Bollobas, Riordan)
If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .
Great, since for real values of n, we get lnnln lnn ∈ [5; 15].
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36
Diameter and giant components
Theorem (Bollobas, Riordan)
If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .
Great, since for real values of n, we get lnnln lnn ∈ [5; 15].
Theorem (Bollobas, Riordan)
If p ∈ (0; 1) and we make a random subgraph Gnm;p of the graph Gnm by deletingits vertices independently each with probability p, then w.h.p. Gnm;p contains aconnected component of size � n.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36
Diameter and giant components
Theorem (Bollobas, Riordan)
If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .
Great, since for real values of n, we get lnnln lnn ∈ [5; 15].
Theorem (Bollobas, Riordan)
If p ∈ (0; 1) and we make a random subgraph Gnm;p of the graph Gnm by deletingits vertices independently each with probability p, then w.h.p. Gnm;p contains aconnected component of size � n.
Great, since we have the robustness property.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36
Diameter and giant components
Theorem (Bollobas, Riordan)
If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .
Great, since for real values of n, we get lnnln lnn ∈ [5; 15].
Theorem (Bollobas, Riordan)
If p ∈ (0; 1) and we make a random subgraph Gnm;p of the graph Gnm by deletingits vertices independently each with probability p, then w.h.p. Gnm;p contains aconnected component of size � n.
Great, since we have the robustness property.
Theorem (Bollobas, Riordan)
If c ∈ (0; 1) and we make a random subgraph Gnm;c of the graph Gnm by deletingits [cn] first vertices, then for c 6 (m− 1)=(m+ 1), w.h.p. Gnm;c contains aconnected component of size � n, and for c > (m− 1)=(m+ 1), w.h.p. all theconnected components of Gnm;c are of size o(n).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36
Diameter and giant components
Theorem (Bollobas, Riordan)
If m > 2, then w.h.p. diamGnm ∼ lnnln lnn .
Great, since for real values of n, we get lnnln lnn ∈ [5; 15].
Theorem (Bollobas, Riordan)
If p ∈ (0; 1) and we make a random subgraph Gnm;p of the graph Gnm by deletingits vertices independently each with probability p, then w.h.p. Gnm;p contains aconnected component of size � n.
Great, since we have the robustness property.
Theorem (Bollobas, Riordan)
If c ∈ (0; 1) and we make a random subgraph Gnm;c of the graph Gnm by deletingits [cn] first vertices, then for c 6 (m− 1)=(m+ 1), w.h.p. Gnm;c contains aconnected component of size � n, and for c > (m− 1)=(m+ 1), w.h.p. all theconnected components of Gnm;c are of size o(n).
Great, since we have the vulnerability to attacks on the hubs.Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 9 / 36
Degree distribution
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36
Degree distribution
Theorem (Bollobas, Riordan, Spencer, Tusnady)
If d 6 n1=15, then w.h.p.
|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36
Degree distribution
Theorem (Bollobas, Riordan, Spencer, Tusnady)
If d 6 n1=15, then w.h.p.
|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:
Great, since we get a power-law.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36
Degree distribution
Theorem (Bollobas, Riordan, Spencer, Tusnady)
If d 6 n1=15, then w.h.p.
|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:
Great, since we get a power-law.
Not too great, since the exponent in the power-law is a bit different from theexperimental ones ( ∈ (2; 3)).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36
Degree distribution
Theorem (Bollobas, Riordan, Spencer, Tusnady)
If d 6 n1=15, then w.h.p.
|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:
Great, since we get a power-law.
Not too great, since the exponent in the power-law is a bit different from theexperimental ones ( ∈ (2; 3)).
Bad, since d 6 n1=15, which is non-realistic.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36
Degree distribution
Theorem (Bollobas, Riordan, Spencer, Tusnady)
If d 6 n1=15, then w.h.p.
|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:
Great, since we get a power-law.
Not too great, since the exponent in the power-law is a bit different from theexperimental ones ( ∈ (2; 3)).
Bad, since d 6 n1=15, which is non-realistic.
The last problem is completely solved: analog of B–R–S–T-theorem with anarbitrary d.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36
Degree distribution
Theorem (Bollobas, Riordan, Spencer, Tusnady)
If d 6 n1=15, then w.h.p.
|{v ∈ Gnm : deg v = d}|n ∼ const(m)d3:
Great, since we get a power-law.
Not too great, since the exponent in the power-law is a bit different from theexperimental ones ( ∈ (2; 3)).
Bad, since d 6 n1=15, which is non-realistic.
The last problem is completely solved: analog of B–R–S–T-theorem with anarbitrary d.Tune the model somehow to get other exponents in the power-law?
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 10 / 36
Clustering
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 11 / 36
Clustering
Let G = (V;E), v ∈ V . Let Nv be the set of neighbours of v in G. Letnv = |Nv|. If nv > 2, thenCv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 11 / 36
Clustering
Let G = (V;E), v ∈ V . Let Nv be the set of neighbours of v in G. Letnv = |Nv|. If nv > 2, thenCv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :
Global clustering coefficient
The global clustering coefficient of G isT (G) =
∑v∈V C2nvCv∑v∈V C2nv :
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 11 / 36
Clustering
Let G = (V;E), v ∈ V . Let Nv be the set of neighbours of v in G. Letnv = |Nv|. If nv > 2, thenCv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :
Global clustering coefficient
The global clustering coefficient of G isT (G) =
∑v∈V C2nvCv∑v∈V C2nv :
Let ](H;G) be the number of copies of a graph H in a graph G. ThenT (G) =3](K3; G)](P2; G)
;where K3 is a triangle and P2 is a 2-path.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 11 / 36
Clustering
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 12 / 36
Clustering
Average local clustering coefficient
The average local clustering coefficient of G isC(G) =1
|V |∑v∈V Cv ;
where, again, Cv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 12 / 36
Clustering
Average local clustering coefficient
The average local clustering coefficient of G isC(G) =1
|V |∑v∈V Cv ;
where, again, Cv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :
The quantities T (G) and C(G) are quite different.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 12 / 36
Clustering
Average local clustering coefficient
The average local clustering coefficient of G isC(G) =1
|V |∑v∈V Cv ;
where, again, Cv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :
The quantities T (G) and C(G) are quite different.
Let G be K2;n−2 plus one edge between the vertices in the part of size 2. ThenC(G) ∼ 1, but T (G) = Θ(
1n).Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 12 / 36
Clustering
Average local clustering coefficient
The average local clustering coefficient of G isC(G) =1
|V |∑v∈V Cv ;
where, again, Cv =|{{x; y} ∈ E : x; y ∈ Nv}|C2nv :
The quantities T (G) and C(G) are quite different.
Let G be K2;n−2 plus one edge between the vertices in the part of size 2. ThenC(G) ∼ 1, but T (G) = Θ(
1n).Very important! However, many inaccuracies in the literature.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 12 / 36
Clustering: experiments and models
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36
Clustering: experiments and models
Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36
Clustering: experiments and models
Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36
Clustering: experiments and models
Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations. This is wrong, provided we do not take into account multiple edgesand loops (which is widely done).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36
Clustering: experiments and models
Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations. This is wrong, provided we do not take into account multiple edgesand loops (which is widely done).
Theorem (Ostroumova, Samosvat)
If in a sequence {Gn} of graphs, the degrees of the vertices follow a power lawwith exponent ∈ (2; 3), then T (Gn) → 0 as n → ∞.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36
Clustering: experiments and models
Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations. This is wrong, provided we do not take into account multiple edgesand loops (which is widely done).
Theorem (Ostroumova, Samosvat)
If in a sequence {Gn} of graphs, the degrees of the vertices follow a power lawwith exponent ∈ (2; 3), then T (Gn) → 0 as n → ∞.
Newman might be right, provided we do take into account multiple edges andloops.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36
Clustering: experiments and models
Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations. This is wrong, provided we do not take into account multiple edgesand loops (which is widely done).
Theorem (Ostroumova, Samosvat)
If in a sequence {Gn} of graphs, the degrees of the vertices follow a power lawwith exponent ∈ (2; 3), then T (Gn) → 0 as n → ∞.
Newman might be right, provided we do take into account multiple edges andloops.
Theorem (Ostroumova)
There exist sequences {Gn} of multigraphs with loops, whose degrees of thevertices follow a power law with exponent ∈ (2; 3) and, nevertheless,T (Gn) > const as n → ∞.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36
Clustering: experiments and models
Experimentally, C(WWW ) seems to be constant. However, no real data forT (WWW ). Newman asserts that T (WWW ) is constant as well without anyexplanations. This is wrong, provided we do not take into account multiple edgesand loops (which is widely done).
Theorem (Ostroumova, Samosvat)
If in a sequence {Gn} of graphs, the degrees of the vertices follow a power lawwith exponent ∈ (2; 3), then T (Gn) → 0 as n → ∞.
Newman might be right, provided we do take into account multiple edges andloops.
Theorem (Ostroumova)
There exist sequences {Gn} of multigraphs with loops, whose degrees of thevertices follow a power law with exponent ∈ (2; 3) and, nevertheless,T (Gn) > const as n → ∞.
However, what is T (G), if G has multiple edges and loops? Many differentdefinitions, and Newman does not say a word about this subtlety!
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 13 / 36
Clustering: the B–R model
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 14 / 36
Clustering: the B–R model
Theorem (Bollobas, Riordan)
The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 14 / 36
Clustering: the B–R model
Theorem (Bollobas, Riordan)
The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .
To calculate the value E(T (Gnm)), one needs to know the number oftriangles and the number of 2-paths.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 14 / 36
Clustering: the B–R model
Theorem (Bollobas, Riordan)
The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .
To calculate the value E(T (Gnm)), one needs to know the number oftriangles and the number of 2-paths.
Recall that ](H;G) is the number of copies of a graph H in a graph G.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 14 / 36
Clustering: the B–R model
Theorem (Bollobas, Riordan)
The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .
To calculate the value E(T (Gnm)), one needs to know the number oftriangles and the number of 2-paths.
Recall that ](H;G) is the number of copies of a graph H in a graph G.
A general and nice result was proved by Ryabchenko and Samosvat.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 14 / 36
Clustering: the B–R model
Theorem (Bollobas, Riordan)
The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .
To calculate the value E(T (Gnm)), one needs to know the number oftriangles and the number of 2-paths.
Recall that ](H;G) is the number of copies of a graph H in a graph G.
A general and nice result was proved by Ryabchenko and Samosvat.
Theorem (Ryabchenko, Samosvat)
For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is
the number of vertices of degree k in H .
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 14 / 36
Clustering: comments on the theorems for the B–Rmodel
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36
Clustering: comments on the theorems for the B–Rmodel
Theorem 1 (Bollobas, Riordan)
The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36
Clustering: comments on the theorems for the B–Rmodel
Theorem 1 (Bollobas, Riordan)
The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .
Theorem 2 (Ryabchenko, Samosvat)
For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is
the number of vertices of degree k in H .
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36
Clustering: comments on the theorems for the B–Rmodel
Theorem 1 (Bollobas, Riordan)
The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .
Theorem 2 (Ryabchenko, Samosvat)
For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is
the number of vertices of degree k in H .
Theorem 2 agrees with Theorem 1: E(](K3; Gnm)) � ln3 n,E(](P2; Gnm)) � n lnn.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36
Clustering: comments on the theorems for the B–Rmodel
Theorem 1 (Bollobas, Riordan)
The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .
Theorem 2 (Ryabchenko, Samosvat)
For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is
the number of vertices of degree k in H .
Theorem 2 agrees with Theorem 1: E(](K3; Gnm)) � ln3 n,E(](P2; Gnm)) � n lnn.
By Theorem 2 the number of K4 etc. is asymptotically constant: bad.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36
Clustering: comments on the theorems for the B–Rmodel
Theorem 1 (Bollobas, Riordan)
The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .
Theorem 2 (Ryabchenko, Samosvat)
For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is
the number of vertices of degree k in H .
Theorem 2 agrees with Theorem 1: E(](K3; Gnm)) � ln3 n,E(](P2; Gnm)) � n lnn.
By Theorem 2 the number of K4 etc. is asymptotically constant: bad.
Unfortunately, C(Gnm) also approaches 0: even worse.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36
Clustering: comments on the theorems for the B–Rmodel
Theorem 1 (Bollobas, Riordan)
The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .
Theorem 2 (Ryabchenko, Samosvat)
For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is
the number of vertices of degree k in H .
Theorem 2 agrees with Theorem 1: E(](K3; Gnm)) � ln3 n,E(](P2; Gnm)) � n lnn.
By Theorem 2 the number of K4 etc. is asymptotically constant: bad.
Unfortunately, C(Gnm) also approaches 0: even worse.
And, once again, = 3, not ∈ (2; 3).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36
Clustering: comments on the theorems for the B–Rmodel
Theorem 1 (Bollobas, Riordan)
The expected value of T (Gnm) tends to 0 as n → ∞: E(T (Gnm)) � ln2 nn .
Theorem 2 (Ryabchenko, Samosvat)
For any H , E(](H;Gnm)) � n](di=0) · (√n)](di=1) · (lnn)](di=2), where ](di = k) is
the number of vertices of degree k in H .
Theorem 2 agrees with Theorem 1: E(](K3; Gnm)) � ln3 n,E(](P2; Gnm)) � n lnn.
By Theorem 2 the number of K4 etc. is asymptotically constant: bad.
Unfortunately, C(Gnm) also approaches 0: even worse.
And, once again, = 3, not ∈ (2; 3).
So let’s tune the model and try to calculate again the number of smallsubgraphs!
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 15 / 36
Buckley–Osthus model
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 16 / 36
Buckley–Osthus model
Which problems we had in the model of Bollobas–Riordan? Non-realistic exponentin the power-law, non-realistic clustering. Can solve the first problem! Thefollowing model is very close to the first one, but it has one important newparameter a > 0 called initial attractiveness of a vertex.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 16 / 36
Buckley–Osthus model
Which problems we had in the model of Bollobas–Riordan? Non-realistic exponentin the power-law, non-realistic clustering. Can solve the first problem! Thefollowing model is very close to the first one, but it has one important newparameter a > 0 called initial attractiveness of a vertex.
Case m = 1H1a;1 — graph with one vertex v1 and one loop.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 16 / 36
Buckley–Osthus model
Which problems we had in the model of Bollobas–Riordan? Non-realistic exponentin the power-law, non-realistic clustering. Can solve the first problem! Thefollowing model is very close to the first one, but it has one important newparameter a > 0 called initial attractiveness of a vertex.
Case m = 1H1a;1 — graph with one vertex v1 and one loop.
Given Hn−1a;1 we can make Hna;1 by adding vertex vn and an edge from it to avertex vi, picked from {v1; : : : ; vn} with probability
P(i = s) =
dHn−1a;1 (vs)+a−1
(a+1)n−1 1 6 s 6 n− 1a(a+1)n−1 s = n
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 16 / 36
Buckley–Osthus model
Which problems we had in the model of Bollobas–Riordan? Non-realistic exponentin the power-law, non-realistic clustering. Can solve the first problem! Thefollowing model is very close to the first one, but it has one important newparameter a > 0 called initial attractiveness of a vertex.
Case m = 1H1a;1 — graph with one vertex v1 and one loop.
Given Hn−1a;1 we can make Hna;1 by adding vertex vn and an edge from it to avertex vi, picked from {v1; : : : ; vn} with probability
P(i = s) =
dHn−1a;1 (vs)+a−1
(a+1)n−1 1 6 s 6 n− 1a(a+1)n−1 s = n
For a = 1, we get the model of Bollobas–Riordan.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 16 / 36
Buckley–Osthus model
Which problems we had in the model of Bollobas–Riordan? Non-realistic exponentin the power-law, non-realistic clustering. Can solve the first problem! Thefollowing model is very close to the first one, but it has one important newparameter a > 0 called initial attractiveness of a vertex.
Case m = 1H1a;1 — graph with one vertex v1 and one loop.
Given Hn−1a;1 we can make Hna;1 by adding vertex vn and an edge from it to avertex vi, picked from {v1; : : : ; vn} with probability
P(i = s) =
dHn−1a;1 (vs)+a−1
(a+1)n−1 1 6 s 6 n− 1a(a+1)n−1 s = n
For a = 1, we get the model of Bollobas–Riordan.
Case m > 1
Given Hmna;1 we can make Hna;m by gluing {v1; : : : ; vm} into v′1 , {vm+1; : : : ; v2m}into v′2, and so on.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 16 / 36
Buckley–Osthus model: degree distribution
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 17 / 36
Buckley–Osthus model: degree distribution
Theorem (Buckley, Osthus)
If d 6 n1=(100(a+1)), then w.h.p.
|{v ∈ Hna;m : deg v = d}|n ∼ const(a;m)da+2:
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 17 / 36
Buckley–Osthus model: degree distribution
Theorem (Buckley, Osthus)
If d 6 n1=(100(a+1)), then w.h.p.
|{v ∈ Hna;m : deg v = d}|n ∼ const(a;m)da+2:
Great, since now we can tune the model to get the expected exponent.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 17 / 36
Buckley–Osthus model: degree distribution
Theorem (Buckley, Osthus)
If d 6 n1=(100(a+1)), then w.h.p.
|{v ∈ Hna;m : deg v = d}|n ∼ const(a;m)da+2:
Great, since now we can tune the model to get the expected exponent.
Bad, since d 6 n1=(100(a+1)).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 17 / 36
Buckley–Osthus model: degree distribution
Theorem (Buckley, Osthus)
If d 6 n1=(100(a+1)), then w.h.p.
|{v ∈ Hna;m : deg v = d}|n ∼ const(a;m)da+2:
Great, since now we can tune the model to get the expected exponent.
Bad, since d 6 n1=(100(a+1)).
Completely solved.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 17 / 36
Buckley–Osthus model: degree distribution
Theorem (Buckley, Osthus)
If d 6 n1=(100(a+1)), then w.h.p.
|{v ∈ Hna;m : deg v = d}|n ∼ const(a;m)da+2:
Great, since now we can tune the model to get the expected exponent.
Bad, since d 6 n1=(100(a+1)).
Completely solved.
Many more further great features of the model!
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 17 / 36
Buckley–Osthus model: second degrees of vertices
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 18 / 36
Buckley–Osthus model: second degrees of vertices
Let d2(t) = |{{i; j} : i 6= t; j 6= t; {i; t} ∈ E(Hna;1); {i; j} ∈ E(Hna;1)}|:
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 18 / 36
Buckley–Osthus model: second degrees of vertices
Let d2(t) = |{{i; j} : i 6= t; j 6= t; {i; t} ∈ E(Hna;1); {i; j} ∈ E(Hna;1)}|:So we calculate the number of edges of Hna;1 that are joined with a neighbour of agiven vertex t.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 18 / 36
Buckley–Osthus model: second degrees of vertices
Let d2(t) = |{{i; j} : i 6= t; j 6= t; {i; t} ∈ E(Hna;1); {i; j} ∈ E(Hna;1)}|:So we calculate the number of edges of Hna;1 that are joined with a neighbour of agiven vertex t.Theorem (Ostroumova, Grechnikov, Kupavskiy, Tetali)
W.h.p.|{i = 1; : : : ; n : d2(i) = d}|n ∼ const(a)da+1
:Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 18 / 36
Buckley–Osthus model: second degrees of vertices
Let d2(t) = |{{i; j} : i 6= t; j 6= t; {i; t} ∈ E(Hna;1); {i; j} ∈ E(Hna;1)}|:So we calculate the number of edges of Hna;1 that are joined with a neighbour of agiven vertex t.Theorem (Ostroumova, Grechnikov, Kupavskiy, Tetali)
W.h.p.|{i = 1; : : : ; n : d2(i) = d}|n ∼ const(a)da+1
:Fits quite well to the real data.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 18 / 36
Buckley–Osthus model: the number of edgesbetween vertices of given degrees
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 19 / 36
Buckley–Osthus model: the number of edgesbetween vertices of given degrees
Let Xn(d1; d2) be the total number of edges between vertices of given degrees.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 19 / 36
Buckley–Osthus model: the number of edgesbetween vertices of given degrees
Let Xn(d1; d2) be the total number of edges between vertices of given degrees.Subtleties with the definition!
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 19 / 36
Buckley–Osthus model: the number of edgesbetween vertices of given degrees
Let Xn(d1; d2) be the total number of edges between vertices of given degrees.Subtleties with the definition!Very important, since
|{v ∈ Vn : deg v = d}| =1d∑d1
Xn(d1; d):Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 19 / 36
Buckley–Osthus model: the number of edgesbetween vertices of given degrees
Let Xn(d1; d2) be the total number of edges between vertices of given degrees.Subtleties with the definition!Very important, since
|{v ∈ Vn : deg v = d}| =1d∑d1
Xn(d1; d):Theorem (Grechnikov)
W.h.p. Xn(d1; d2)n ∼ c(a;m)
(
(d1 + d2)1−ad2
1d22
) :Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 19 / 36
Buckley–Osthus model: “power and glory”
Theorem (Grechnikov)
Let d1 > m and d2 > m. Let X = Xn(d1; d2). There exists a function cX(d1; d2)such that
EXn(d1; d2) = cX(d1; d2)n +Oa;m(1)
andcX(d1; d2) =Γ(d1 −m+ma)Γ(d2 −m+ma)
Γ(d1 −m+ma+ 2)Γ(d2 −m+ma + 2)×
× Γ(d1 + d2 − 2m+ 2ma+ 3)
Γ(d1 + d2 − 2m+ 2ma+ a + 2)ma(a+ 1)
Γ(ma+ a + 1)
Γ(ma) ×
×(
1 + �(d1; d2)(d1 −m+ma + 1)(d2 −m +ma+ 1)
(d1 + d2 − 2m+ 2ma+ 1)(d1 + d2 − 2m+ 2ma+ 2)
) ;where
−4 +2
1 +ma 6 �(d1; d2) 6 aΓ(ma+ 1)Γ(2ma+ a+ 3)
Γ(2ma+ 2)Γ(ma + a+ 2):
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 20 / 36
Bollobas–Riordan model: “power and glory”
Theorem (Grechnikov)
If d1 < k, d2 < k or d1 = d2 = k, then X = 0. If d1 > k; d2 > k andd1 + d2 > 2k + 1, then the expected value of X is
EX =k(k + 1)d1(d1 + 1)d2(d2 + 1)
(
1 −Ck+1
2k+2Cd1−kd1+d2−2kCd1+1d1+d2+2
)
(2kt+ 1)−
−k∑n=1
Cd1−nd1+d2−2nd1d2Cd1d1+d2
(
(2n)!n!(n+ 1)!
k + 1
2k + [n = k] (2k)!2(k − 1)!2
)
−
− [d1 = k] (k − 1)(k + 1)
2kd2(d2 + 1)− [d2 = k] (k − 1)(k + 1)
2kd1(d1 + 1)+Ok;d1;d2
(
1t) :Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 21 / 36
Buckley–Osthus model: one more evidence of itsquality
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 22 / 36
Buckley–Osthus model: one more evidence of itsquality
Assume that the web-graph is governed by the Buckley–Osthus model. What isthe most likely parameter a?
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 22 / 36
Buckley–Osthus model: one more evidence of itsquality
Assume that the web-graph is governed by the Buckley–Osthus model. What isthe most likely parameter a?We may try to find an optimal a by comparing the reality with the fact that thenumber of vertices of degree d is close to d−2−a (Grechnikov).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 22 / 36
Buckley–Osthus model: one more evidence of itsquality
Assume that the web-graph is governed by the Buckley–Osthus model. What isthe most likely parameter a?We may try to find an optimal a by comparing the reality with the fact that thenumber of vertices of degree d is close to d−2−a (Grechnikov).
We may try to find an optimal a by comparing the reality with the fact that thenumber of edges between vertices of degree d1 and d2 is close to(d1 + d2)
1−ad−21 d−2
2 (Grechnikov).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 22 / 36
Buckley–Osthus model: one more evidence of itsquality
Assume that the web-graph is governed by the Buckley–Osthus model. What isthe most likely parameter a?We may try to find an optimal a by comparing the reality with the fact that thenumber of vertices of degree d is close to d−2−a (Grechnikov).
We may try to find an optimal a by comparing the reality with the fact that thenumber of edges between vertices of degree d1 and d2 is close to(d1 + d2)
1−ad−21 d−2
2 (Grechnikov).
Assertion (Grechnikov, Zhukovskii, Vinogradov, Ostroumova, Pritykin,Gusev, Raigorodskii)
In both cases, the optimum is at the same a ≈ 0:27.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 22 / 36
Buckley–Osthus model: an application
We have seen that the model fits quite well the reality. How could we apply it?
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36
Buckley–Osthus model: an application
We have seen that the model fits quite well the reality. How could we apply it?
Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36
Buckley–Osthus model: an application
We have seen that the model fits quite well the reality. How could we apply it?
Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?
An algorithm
Calculate the total degrees of all the vertices of H (in the completeweb-graph).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36
Buckley–Osthus model: an application
We have seen that the model fits quite well the reality. How could we apply it?
Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?
An algorithm
Calculate the total degrees of all the vertices of H (in the completeweb-graph).
For each pair of vertices of H calculate the expected number of edgesbetween them using Step 1 and Grechnikov’s results.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36
Buckley–Osthus model: an application
We have seen that the model fits quite well the reality. How could we apply it?
Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?
An algorithm
Calculate the total degrees of all the vertices of H (in the completeweb-graph).
For each pair of vertices of H calculate the expected number of edgesbetween them using Step 1 and Grechnikov’s results.
Sum all the values found at Step 2.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36
Buckley–Osthus model: an application
We have seen that the model fits quite well the reality. How could we apply it?
Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?
An algorithm
Calculate the total degrees of all the vertices of H (in the completeweb-graph).
For each pair of vertices of H calculate the expected number of edgesbetween them using Step 1 and Grechnikov’s results.
Sum all the values found at Step 2.
Compare the result of Step 3 with the real number of edges in H .
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36
Buckley–Osthus model: an application
We have seen that the model fits quite well the reality. How could we apply it?
Assume that a subgraph H of the real web-graph has been found by an algorithm.How could we check automatically whether this graph is “expected” or it probablyrepresents an “unnatural” structure like a spam construction or an “explosion”(say, important news)?
An algorithm
Calculate the total degrees of all the vertices of H (in the completeweb-graph).
For each pair of vertices of H calculate the expected number of edgesbetween them using Step 1 and Grechnikov’s results.
Sum all the values found at Step 2.
Compare the result of Step 3 with the real number of edges in H .
The difference between the real and the expected values can be used as a feature.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 23 / 36
Buckley–Osthus model: clustering
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 24 / 36
Buckley–Osthus model: clustering
Theorem (Eggemann, Noble)
If a > 1, then E(](K3; Hna;m)) � lnn as n → ∞.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 24 / 36
Buckley–Osthus model: clustering
Theorem (Eggemann, Noble)
If a > 1, then E(](K3; Hna;m)) � lnn as n → ∞.
It’s remarkable that for a = 1 (i.e., for the B–R model) we had ln2 n instead oflnn.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 24 / 36
Buckley–Osthus model: clustering
Theorem (Eggemann, Noble)
If a > 1, then E(](K3; Hna;m)) � lnn as n → ∞.
It’s remarkable that for a = 1 (i.e., for the B–R model) we had ln2 n instead oflnn.
Theorem (Eggemann, Noble)
If a > 1, then E(](P2; Hna;m)) � n as n → ∞.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 24 / 36
Buckley–Osthus model: clustering
Theorem (Eggemann, Noble)
If a > 1, then E(](K3; Hna;m)) � lnn as n → ∞.
It’s remarkable that for a = 1 (i.e., for the B–R model) we had ln2 n instead oflnn.
Theorem (Eggemann, Noble)
If a > 1, then E(](P2; Hna;m)) � n as n → ∞.
Theorem (Eggemann, Noble)
If a > 1, then E(T (Hna;m)) � lnnn as n → ∞.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 24 / 36
Buckley–Osthus model: small subgraphs
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 25 / 36
Buckley–Osthus model: small subgraphs
Very recently Tilga has proved far-reaching generalizations and refinements of thetheorems by Bollobas–Riordan, Ryabchenko–Samosvat, and Eggemann–Noble.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 25 / 36
Buckley–Osthus model: small subgraphs
Very recently Tilga has proved far-reaching generalizations and refinements of thetheorems by Bollobas–Riordan, Ryabchenko–Samosvat, and Eggemann–Noble.
Theorem (Tilga)
For any a > 0 and any fixed graph F , the order of magnitude of E(](F;Hna;m)) isfound.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 25 / 36
Buckley–Osthus model: small subgraphs
Very recently Tilga has proved far-reaching generalizations and refinements of thetheorems by Bollobas–Riordan, Ryabchenko–Samosvat, and Eggemann–Noble.
Theorem (Tilga)
For any a > 0 and any fixed graph F , the order of magnitude of E(](F;Hna;m)) isfound.
The exact statement is quite cumbersome involving many parameters and cases.So we just give several most important and short enough corollaries.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 25 / 36
Buckley–Osthus model: paths and cliques
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 26 / 36
Buckley–Osthus model: paths and cliques
Theorem (Tilga)
Let m > 2 and a < 1, � = 1a+1 . Let Pl be a path of length l. Then for n → ∞,
E(](Pl; Hna;m)
)
=
{n(2�−1)k+1 · Θ(ml) for l = 2k;n(2�−1)k+1 · lnn · Θ(ml) for l = 2k + 1:
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 26 / 36
Buckley–Osthus model: paths and cliques
Theorem (Tilga)
Let m > 2 and a < 1, � = 1a+1 . Let Pl be a path of length l. Then for n → ∞,
E(](Pl; Hna;m)
)
=
{n(2�−1)k+1 · Θ(ml) for l = 2k;n(2�−1)k+1 · lnn · Θ(ml) for l = 2k + 1:Theorem (Tilga)
Let Kk be a clique of size k, where 4 6 k 6 m + 1. Then for n → ∞,
E(](Kk; Hna;m)
)
=
n1+(�−1)(k−1) · Θ(mC2k) for a < 1k−2 ;lnn · Θ(mC2k) for a = 1k−2 ;Θ(mC2k) for a > 1k−2 :
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 26 / 36
Buckley–Osthus model: paths and cliques
Theorem (Tilga)
Let m > 2 and a < 1, � = 1a+1 . Let Pl be a path of length l. Then for n → ∞,
E(](Pl; Hna;m)
)
=
{n(2�−1)k+1 · Θ(ml) for l = 2k;n(2�−1)k+1 · lnn · Θ(ml) for l = 2k + 1:Theorem (Tilga)
Let Kk be a clique of size k, where 4 6 k 6 m + 1. Then for n → ∞,
E(](Kk; Hna;m)
)
=
n1+(�−1)(k−1) · Θ(mC2k) for a < 1k−2 ;lnn · Θ(mC2k) for a = 1k−2 ;Θ(mC2k) for a > 1k−2 :
For example, if a = 13 (close to 0:27), then the number of K5 is about logn, and
the number of K4 is about 4√n. Much more realistic than in the B–R model!
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 26 / 36
Buckley–Osthus model: cycles and bicliques
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 27 / 36
Buckley–Osthus model: cycles and bicliques
Theorem (Tilga)
Let Cl be a cycle of length l. Then for n → ∞,
E(](Cl; Hna;m)
)
=
{n(2�−1)k · Θ(ml) for l = 2k;n(2�−1)k · lnn · Θ(ml) for l = 2k + 1:
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 27 / 36
Buckley–Osthus model: cycles and bicliques
Theorem (Tilga)
Let Cl be a cycle of length l. Then for n → ∞,
E(](Cl; Hna;m)
)
=
{n(2�−1)k · Θ(ml) for l = 2k;n(2�−1)k · lnn · Θ(ml) for l = 2k + 1:Theorem (Tilga)
Let Kk;l be a biclique with 2 6 l 6 min{k;m}. Then for n → ∞,
E(](Kk;l; Hna;m)
)
=
nk(1+(�−1)l) · Θ(mkl) for a < 1l−1 ;(lnn)k · Θ(mkl) for a = 1l−1 ;Θ(mkl) for a > 1l−1 :
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 27 / 36
Buckley–Osthus model: cycles and bicliques
Theorem (Tilga)
Let Cl be a cycle of length l. Then for n → ∞,
E(](Cl; Hna;m)
)
=
{n(2�−1)k · Θ(ml) for l = 2k;n(2�−1)k · lnn · Θ(ml) for l = 2k + 1:Theorem (Tilga)
Let Kk;l be a biclique with 2 6 l 6 min{k;m}. Then for n → ∞,
E(](Kk;l; Hna;m)
)
=
nk(1+(�−1)l) · Θ(mkl) for a < 1l−1 ;(lnn)k · Θ(mkl) for a = 1l−1 ;Θ(mkl) for a > 1l−1 :
The number of bicliques shows how many communities are formed. For example,if a = 1
3 (close to 0:27), then there are many Kk;4 and a lot of Kk;3, which wasimpossible in the B–R model (there are no vertices of degree < 3 in such graphs).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 27 / 36
Classical “Google” PageRankGn = (Vn; En) — web-graph.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 28 / 36
Classical “Google” PageRankGn = (Vn; En) — web-graph.Classical “Google” PageRank is the solution �(n) = (�i(n))i=1;:::;|Vn| to thesystem �i(n) = c∑j→i �j(n)
outdeg j +1 − c|Vn| ; i = 1; : : : ; |Vn|:
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 28 / 36
Classical “Google” PageRankGn = (Vn; En) — web-graph.Classical “Google” PageRank is the solution �(n) = (�i(n))i=1;:::;|Vn| to thesystem �i(n) = c∑j→i �j(n)
outdeg j +1 − c|Vn| ; i = 1; : : : ; |Vn|:
The coordinates �i(n) follow a power-law similarly to the degrees (with anotherparameter of the law).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 28 / 36
Classical “Google” PageRankGn = (Vn; En) — web-graph.Classical “Google” PageRank is the solution �(n) = (�i(n))i=1;:::;|Vn| to thesystem �i(n) = c∑j→i �j(n)
outdeg j +1 − c|Vn| ; i = 1; : : : ; |Vn|:
The coordinates �i(n) follow a power-law similarly to the degrees (with anotherparameter of the law).
The same is true in a Barabasi–Albert model.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 28 / 36
Classical “Google” PageRankGn = (Vn; En) — web-graph.Classical “Google” PageRank is the solution �(n) = (�i(n))i=1;:::;|Vn| to thesystem �i(n) = c∑j→i �j(n)
outdeg j +1 − c|Vn| ; i = 1; : : : ; |Vn|:
The coordinates �i(n) follow a power-law similarly to the degrees (with anotherparameter of the law).
The same is true in a Barabasi–Albert model.
Theorem (Avrachenkov)
For i > 0
E�i(n) =1 − c1 + n ( 1
1 + c +cΓ (i+ 1
2
)
Γ(n + c
2 + 1)
(1 + c)Γ (i+ c2 + 1
)
Γ(n + 1
2
)
)
≈
≈ 1 − c1 + n ( 1
1 + c +c
1 + c (i +1
2
)− 1+c2(n +
1
2
)1+c
2
) :Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 28 / 36
Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 29 / 36
Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).
To each i ∈ V we assign an l-dimensional vector vi of features, and to each edgei → j in E we assign an m-dimensional vector ei;j of features.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 29 / 36
Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).
To each i ∈ V we assign an l-dimensional vector vi of features, and to each edgei → j in E we assign an m-dimensional vector ei;j of features.
We have three parameters to be adjusted optimally.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 29 / 36
Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).
To each i ∈ V we assign an l-dimensional vector vi of features, and to each edgei → j in E we assign an m-dimensional vector ei;j of features.
We have three parameters to be adjusted optimally.
Two main parameters
An l-dimensional vector !, which gives us a weight of a vertex i ∈ V : theweight is f(!; i) = (!;vi) — scalar product.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 29 / 36
Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).
To each i ∈ V we assign an l-dimensional vector vi of features, and to each edgei → j in E we assign an m-dimensional vector ei;j of features.
We have three parameters to be adjusted optimally.
Two main parameters
An l-dimensional vector !, which gives us a weight of a vertex i ∈ V : theweight is f(!; i) = (!;vi) — scalar product.
An m-dimensional vector ', which gives us a weight of an edge i → j in E:the weight is g('; i → j) = ('; ei;j).
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 29 / 36
Weighted “Yandex” PageRankGn = (Vn; En) — web-graph. In what follows, we omit n in the notation to makeit less cumbersome. So G = (V;E).
To each i ∈ V we assign an l-dimensional vector vi of features, and to each edgei → j in E we assign an m-dimensional vector ei;j of features.
We have three parameters to be adjusted optimally.
Two main parameters
An l-dimensional vector !, which gives us a weight of a vertex i ∈ V : theweight is f(!; i) = (!;vi) — scalar product.
An m-dimensional vector ', which gives us a weight of an edge i → j in E:the weight is g('; i → j) = ('; ei;j).
The third parameter c ∈ (0; 1) will appear on the next slide.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 29 / 36
Weighted “Yandex” PageRank
Main definition
Weighted PageRank is a vector � — the solution to a system � = A�, where A isa matrix with entriesAi;j = c f(!; j)
∑k∈V f(!; k) + (1 − c) g('; i → j)∑h: i→h g('; i → h)
:
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 30 / 36
Weighted “Yandex” PageRank
Main definition
Weighted PageRank is a vector � — the solution to a system � = A�, where A isa matrix with entriesAi;j = c f(!; j)
∑k∈V f(!; k) + (1 − c) g('; i → j)∑h: i→h g('; i → h)
:Main problem
Let S be a measure of the difference between our PageRank �(!; '; c) and someestimates assigned to each document (to each i ∈ V ) according to a given searchquery (e.g., S is the standard deviation). Find
(!0; '0; c0) = argmin!;'S(�(!; '; c); vector of estimates):Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 30 / 36
Weighted “Yandex” PageRank
Main definition
Weighted PageRank is a vector � — the solution to a system � = A�, where A isa matrix with entriesAi;j = c f(!; j)
∑k∈V f(!; k) + (1 − c) g('; i → j)∑h: i→h g('; i → h)
:Main problem
Let S be a measure of the difference between our PageRank �(!; '; c) and someestimates assigned to each document (to each i ∈ V ) according to a given searchquery (e.g., S is the standard deviation). Find
(!0; '0; c0) = argmin!;'S(�(!; '; c); vector of estimates):Work in progress. Recent advances are made in cooperation with Nesterov and hisresearch group.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 30 / 36
A new general class of models
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 31 / 36
A new general class of models
Buckley–Osthus is not ideal for clustering. Can one do anything to improve it?
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 31 / 36
A new general class of models
Buckley–Osthus is not ideal for clustering. Can one do anything to improve it?
Many multiparametric models.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 31 / 36
A new general class of models
Buckley–Osthus is not ideal for clustering. Can one do anything to improve it?
Many multiparametric models.
A break-through is due to Ryabchenko, Samosvat, Ostroumova.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 31 / 36
A new general class of models
Buckley–Osthus is not ideal for clustering. Can one do anything to improve it?
Many multiparametric models.
A break-through is due to Ryabchenko, Samosvat, Ostroumova.
The PA-class
Let Gnm (n > n0) be a graph with n vertices {1; : : : ; n} and mn edges obtained asa result of the following random graph process. We start at the time n0 from anarbitrary graph Gn0m with n0 vertices and mn0 edges. On the (n+ 1)-th step(n > n0), we make the graph Gn+1m from Gnm by adding a new vertex n + 1 andm edges connecting this vertex to some m vertices from the set {1; : : : ; n; n+ 1}.Denote by dnv the degree of a vertex v in Gnm. Assume that for some constants Aand B the following conditions are satisfied:
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 31 / 36
A new general class of models: continuation
The PA-class conditions
P(dn+1v = dnv | Gnm) = 1 −Adnvn −B 1n +O((dnv )2n2
) ; 1 6 v 6 n ; (1)
P(dn+1v = dnv + 1 | Gnm) = Adnvn +B 1n +O((dnv )
2n2
) ; 1 6 v 6 n ; (2)
P(dn+1v = dnv + j | Gnm) = O((dnv )
2n2
) ; 2 6 j 6 m; 1 6 v 6 n ; (3)
P(dn+1n+1 = m+ j) = O( 1n) ; 1 6 j 6 m : (4)
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 32 / 36
A new general class of models: continuation
The PA-class conditions
P(dn+1v = dnv | Gnm) = 1 −Adnvn −B 1n +O((dnv )2n2
) ; 1 6 v 6 n ; (1)
P(dn+1v = dnv + 1 | Gnm) = Adnvn +B 1n +O((dnv )
2n2
) ; 1 6 v 6 n ; (2)
P(dn+1v = dnv + j | Gnm) = O((dnv )
2n2
) ; 2 6 j 6 m; 1 6 v 6 n ; (3)
P(dn+1n+1 = m+ j) = O( 1n) ; 1 6 j 6 m : (4)
For A = 1=2, B = 0, we get Bollobas–Riordan.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 32 / 36
A new general class of models: continuation
The PA-class conditions
P(dn+1v = dnv | Gnm) = 1 −Adnvn −B 1n +O((dnv )2n2
) ; 1 6 v 6 n ; (1)
P(dn+1v = dnv + 1 | Gnm) = Adnvn +B 1n +O((dnv )
2n2
) ; 1 6 v 6 n ; (2)
P(dn+1v = dnv + j | Gnm) = O((dnv )
2n2
) ; 2 6 j 6 m; 1 6 v 6 n ; (3)
P(dn+1n+1 = m+ j) = O( 1n) ; 1 6 j 6 m : (4)
For A = 1=2, B = 0, we get Bollobas–Riordan.
For A = 1=(1 + a), B = ma=(1 + a), we get Buckley–Osthus.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 32 / 36
A new general class of models: results and questions
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 33 / 36
A new general class of models: results and questions
Theorem (Ostroumova, Ryabchenko, Samosvat)
W.h.p.|{v ∈ Gnm : deg v = d}|n ∼ const(A;B;m)d1+1=A :
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 33 / 36
A new general class of models: results and questions
Theorem (Ostroumova, Ryabchenko, Samosvat)
W.h.p.|{v ∈ Gnm : deg v = d}|n ∼ const(A;B;m)d1+1=A :
Theorem (Ostroumova, Ryabchenko, Samosvat)
If 2A < 1 then w.h.p. T (n) ∼ c(A;B;m) ;If 2A = 1 then w.h.p. T (n) ∼ c′(A;B;m)
lnn ;If 2A > 1 then for any " > 0 w.h.p. n1−2A−" 6 T (n) 6 n1−2A+" :
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 33 / 36
A new general class of models: results and questions
Theorem (Ostroumova, Ryabchenko, Samosvat)
W.h.p.|{v ∈ Gnm : deg v = d}|n ∼ const(A;B;m)d1+1=A :
Theorem (Ostroumova, Ryabchenko, Samosvat)
If 2A < 1 then w.h.p. T (n) ∼ c(A;B;m) ;If 2A = 1 then w.h.p. T (n) ∼ c′(A;B;m)
lnn ;If 2A > 1 then for any " > 0 w.h.p. n1−2A−" 6 T (n) 6 n1−2A+" :
Great, since in the first case, we have a constant clustering together withpower-law!
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 33 / 36
A new general class of models: results and questions
Theorem (Ostroumova, Ryabchenko, Samosvat)
W.h.p.|{v ∈ Gnm : deg v = d}|n ∼ const(A;B;m)d1+1=A :
Theorem (Ostroumova, Ryabchenko, Samosvat)
If 2A < 1 then w.h.p. T (n) ∼ c(A;B;m) ;If 2A = 1 then w.h.p. T (n) ∼ c′(A;B;m)
lnn ;If 2A > 1 then for any " > 0 w.h.p. n1−2A−" 6 T (n) 6 n1−2A+" :
Great, since in the first case, we have a constant clustering together withpower-law!
Open questions and questions under solution
(Ostroumova, Krot) The local clustering coefficient is constant in n in thenew models and is proportional to 1=d in the degree d.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 33 / 36
A new general class of models: results and questions
Theorem (Ostroumova, Ryabchenko, Samosvat)
W.h.p.|{v ∈ Gnm : deg v = d}|n ∼ const(A;B;m)d1+1=A :
Theorem (Ostroumova, Ryabchenko, Samosvat)
If 2A < 1 then w.h.p. T (n) ∼ c(A;B;m) ;If 2A = 1 then w.h.p. T (n) ∼ c′(A;B;m)
lnn ;If 2A > 1 then for any " > 0 w.h.p. n1−2A−" 6 T (n) 6 n1−2A+" :
Great, since in the first case, we have a constant clustering together withpower-law!
Open questions and questions under solution
(Ostroumova, Krot) The local clustering coefficient is constant in n in thenew models and is proportional to 1=d in the degree d.What’s with “Google”, “Yandex” and other PageRanks in the new models?
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 33 / 36
“Fresh” models
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 34 / 36
“Fresh” models
Another problem of the preferential attachment models: the older is a page, thegreater is the probability that it will gain more and more citations. That’s notcompletely realistic, for we have a large part of the Internet containing news andother media.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 34 / 36
“Fresh” models
Another problem of the preferential attachment models: the older is a page, thegreater is the probability that it will gain more and more citations. That’s notcompletely realistic, for we have a large part of the Internet containing news andother media.
Need to use an age factor for each page.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 34 / 36
“Fresh” models
Another problem of the preferential attachment models: the older is a page, thegreater is the probability that it will gain more and more citations. That’s notcompletely realistic, for we have a large part of the Internet containing news andother media.
Need to use an age factor for each page.
Samosvat and Ostroumova proposed a class of such models.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 34 / 36
“Fresh” models
Another problem of the preferential attachment models: the older is a page, thegreater is the probability that it will gain more and more citations. That’s notcompletely realistic, for we have a large part of the Internet containing news andother media.
Need to use an age factor for each page.
Samosvat and Ostroumova proposed a class of such models.
Let’s construct a sequence of random graphs {Gn}. For that, consider an integerconstant m (vertex outdegree), an integer function N(n), and a sequence ofmutually independent random variables �1; �2; : : : with some given distributiontaking positive values.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 34 / 36
“Fresh” models
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 35 / 36
“Fresh” models
First, take 2 vertices and 1 edge between them (graph Gn2 ). The first 2 vertices
have “inherent qualities” q(1) = �1, q(2) = �2. At the t+ 1-th step(2 6 t 6 n− 1) one vertex and m edges are added to Gnt :q(t+ 1) = �t+1; P(t + 1 → i) =
attrt(i)∑tj=1 attrt(j) ;
where
attrt(i) = (1 or q(i)) · (1 or dt(i)) · (1 or I[i > t−N(n)] or e− t−iN(n)
)
and dt(i) is the degree of the vertex i in Gnt .
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 35 / 36
“Fresh” models
First, take 2 vertices and 1 edge between them (graph Gn2 ). The first 2 vertices
have “inherent qualities” q(1) = �1, q(2) = �2. At the t+ 1-th step(2 6 t 6 n− 1) one vertex and m edges are added to Gnt :q(t+ 1) = �t+1; P(t + 1 → i) =
attrt(i)∑tj=1 attrt(j) ;
where
attrt(i) = (1 or q(i)) · (1 or dt(i)) · (1 or I[i > t−N(n)] or e− t−iN(n)
)
and dt(i) is the degree of the vertex i in Gnt .
Let Gn = Gnn.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 35 / 36
“Fresh” models
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 36 / 36
“Fresh” models
Theorem (Samosvat, Ostroumova)
The most realistic case is when attrt(i) = q(i) · e− t−iN(n) and �i have a power law(say, Pareto) distribution. In this case, not only w.h.p. the fraction of the numberof vertices with a given degree follows a power law with a tunable exponent, butalso the quantity e(T ), which is the fraction of edges that connect vertices withage difference greater than T , decreases exponentially, right as in the reality.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 36 / 36
“Fresh” models
Theorem (Samosvat, Ostroumova)
The most realistic case is when attrt(i) = q(i) · e− t−iN(n) and �i have a power law(say, Pareto) distribution. In this case, not only w.h.p. the fraction of the numberof vertices with a given degree follows a power law with a tunable exponent, butalso the quantity e(T ), which is the fraction of edges that connect vertices withage difference greater than T , decreases exponentially, right as in the reality.
The model is used in Yandex to substantially improve the crawling algorithm.
Andrei Raigorodskii (MSU, MIPT, YND) RUSSIR 36 / 36