qmss: structure of networks - uni-lj.si
TRANSCRIPT
'
&
$
%
Konigsberg bridges
Structureof Networks
Vladimir BatageljAnuska Ferligoj
University of Ljubljana
European Science FoundationQMSS Workshop
Ljubljana, Friday, July 1, 2005, 9:00–12:30
version: June 29, 2005 / 05 : 00
V. Batagelj, A. Ferligoj: Structure of Networks 2'
&
$
%
Outline1 Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Clusters, clusterings, partitions, hierarchies . . . . . . . . . . . . . . . . . . . . 2
5 Subgraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
7 Triads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
9 Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
11 Equivalence relations and Partitions . . . . . . . . . . . . . . . . . . . . . . . . 11
12 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
15 Important vertices in network . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
22 Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
24 Islands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
28 Dense groups, cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
34 Acyclic networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
40 Citation networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
49 Genealogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 1'
&
$
%
Degrees
degree of vertex v, deg(v) = numberof lines with v as end-vertex;indegree of vertex v, indeg(v) =number of lines with v as terminalvertex (end-vertex is both initial andterminal);outdegree of vertex v, outdeg(v) =number of lines with v as initial vertex.
n = 12, m = 23, indeg(e) = 3, outdeg(e) = 5, deg(e) = 6∑v∈V
indeg(v) =∑v∈V
outdeg(v) = |A|+ 2|E|,∑v∈V
deg(v) = 2|L| − |E0|
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 2'
&
$
%
Clusters, clusterings, partitions, hierarchiesA nonempty subset C ⊆ V is called a cluster (group). A nonempty set ofclusters C = {Ci} forms a clustering.
Clustering C = {Ci} is a partition iff
∪C =⋃i
Ci = V in i 6= j ⇒ Ci ∩ Cj = ∅
Clustering C = {Ci} is a hierarchy iff
Ci ∩ Cj ∈ {∅, Ci, Cj}
Hierarchy C = {Ci} is complete, iff ∪C = V; and is basic if for allv ∈ ∪C also {v} ∈ C.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 3'
&
$
%
Contraction of cluster
Contraction of cluster C is called a graph G/C, in which all vertices of thecluster C are replaced by a single vertex, say c. More precisely:
G/C = (V ′,L′), where V ′ = (V \ C) ∪ {c} and L′ consists of linesfrom L that have both end-vertices in V \ C. Beside these it contains alsoa ’star’ with the center c and: arc (v, c), if ∃p ∈ L, u ∈ C : p(v, u);or arc (c, v), if ∃p ∈ L, u ∈ C : p(u, v). There is a loop (c, c) in c if∃p ∈ L, u, v ∈ C : p(u, v).
In a network over graph G we have also to specify how are determined thevalues/weights in the shrunk part of the network. Usually as the sum ormaksimum/minimum of the original values.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 4'
&
$
%
Contracted clusters – international tradePajek - shadow [0.00,1.00]
usacancubhaidomjamtrimexguahonelsniccospancolvenecuperbrabolparchiarguruukiirenetbelluxfraswispaporwgeegepolaushunczeitamatalbyuggrecypbulrumusrfinswenordenicemlisendahnaunirivoguiupvlibsieghatogcamniggabcarchdconzaiugakenburrwasomethsafmaamoralgtunliysudegyirnturirqsyrlebjorisrsauyemkuwafgchamontaikodkorjapindpakbrmsrinepthakmrlaovndvnrmlaphiinsautnze
usa
can
cub
hai
dom
jam
tri
mex
gua
hon
els
nic
cos
pan
col
ven
ecu
per
bra
bol
par
chi
arg
uru
uki
ire net
bel
lux
fra
swi
spa
por
wge
ege
pol
aus
hun
cze
ita mat
alb
yug
gre
cyp
bul
rum
usr
fin swe
nor
den
ice
mli
sen
dah
nau
nir
ivo
gui
upv
lib sie
gha
tog
cam
nig
gab
car
chd
con
zai
uga
ken
bur
rwa
som
eth
saf
maa
mor
alg
tun
liy sud
egy
irn tur
irq syr
leb
jor
isr
sau
yem
kuw
afg
cha
mon
tai
kod
kor
jap
ind
pak
brm
sri
nep
tha
kmr
lao
vnd
vnr
mla
phi
ins
aut
nze
N. America
L. America
S. America
EuropeAfrica
Asia
Australia
Snyder and Kick’s international trade. Matrix display of dense networks.
w(Ci, Cj) =n(Ci, Cj)
n(Ci) · n(Cj)
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 5'
&
$
%
Subgraph
A subgraphH = (V ′,L′) of a given graph G = (V,L) is a graph which setof lines is a subset of set of lines of G, L′ ⊆ L, its vertex set is a subset ofset of vertices of G, V ′ ⊆ V , and it contains all end-vertices of L′.
A subgraph can be induced by a given subset of vertices or lines.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 6'
&
$
%
Cut-out – induced subgraph: Snyder and Kick – Africa
mli
sen
dah
nau
nir
ivo
gui
upv
lib
sie
gha
tog
cam
nig
gab
car
chd
con
zai
uga
ken
bur
rwa
som
eth
saf
maa
mor
alg
tun
liy
sud
egy
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 7'
&
$
%
Triads
1 - 003
2 - 012
3 - 102
4 - 021D
5 - 021U
6 - 021C
7 - 111D
8 - 111U
9 - 030T
10 - 030C
11 - 201
12 - 120D
13 - 120U
14 - 120C
15 - 210
16 - 300
Let G = (V, R) be a simple di-rected graph without loops. A triadis a subgraph induced by a given setof three vertices.There are 16 nonisomorphic (typesof) triads. They can be partitionedinto three basic types:
• the null triad 003;
• dyadic triads 012 and 102; and
• connected triads: 111D, 201,210, 300, 021D, 111U, 120D,021U, 030T, 120U, 021C,030C and 120C.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 8'
&
$
%
Triadic spectrum
Moody
Several properties of agraph can be expressedin terms of its triadicspectrum – distributionof all its triads. It alsoprovides ingredients forp∗ network models. Adirect approach to de-termine the triadic spec-trum is of order O(n3);but in most large graphsit can be determinedmuch faster.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 9'
&
$
%
Walkslength |s| of the walk s is the numberof lines it contains.s = (j, h, l, g, e, f, h, l, e, c, b, a)|s| = 11A walk is closed iff its initial and ter-minal vertex coincide.If we don’t consider the direction of thelines in the walk we get a semiwalk orchain.trail – walk with all lines differentpath – walk with all vertices differentcycle – closed walk with all internalvertices different
A graph is acyclic if it doesn’t contain any cycle.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 10'
&
$
%
Shortest pathsA shortest path from u to v is alsocalled a geodesic from u to v. Itslength is denoted by d(u, v).If there is no walk from u to v thend(u, v) =∞.d(j, a) = |(j, h, d, c, b, a)| = 5d(a, j) =∞d(u, v) = max(d(u, v), d(v, u))is a distance:d(v, v) = 0, d(u, v) = d(v, u),d(u, v) ≤ d(u, t) + d(t, v).
The diameter of a graph equals to the distance between the most distant pairof vertices: D = maxu,v∈V d(u, v).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 11'
&
$
%
Equivalence relations and PartitionsA relation R on V is an equivalence relation iff it isreflexive ∀v ∈ V : vRv, symmetric ∀u, v ∈ V : uRv ⇒ vRu, andtransitive ∀u, v, z ∈ V : uRz ∧ zRv ⇒ uRv.
Each equivalence relation determines a partition into equivalence classes[v] = {u : vRu}.
Each partition C determines an equivalence relationuRv ⇔ ∃C ∈ C : u ∈ C ∧ v ∈ C.
k-neighbors of v is the set of vertices on ’distance’ k from v, Nk(v) ={u ∈ v : d(v, u) = k}.
The set of all k-neighbors, k = 0, 1, ... of v is a partition of V .
k-neighborhood of v, N (k)(v) = {u ∈ v : d(v, u) ≤ k}.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 12'
&
$
%
Connectivity
Vertex u is reachable from vertex v iffthere exists a walk with initial vertex v
and terminal vertex u.Vertex v is weakly connected with ver-tex u iff there exists a semiwalk with v
and u as its end-vertices.Vertex v is strongly connected with ver-tex u iff they are mutually reachable.
Weak and strong connectivity are equivalence relations.
Equivalence classes induce weak/strong components.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 13'
&
$
%
Weak components
Reordering the vertices of networksuch that the vertices from the sameclass of weak partition are put to-gether we get a matrix representa-tion consisting of diagonal blocks –weak components.Most problems can be solved sepa-rately on each component and after-ward these solutions combined intofinal solution.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 14'
&
$
%
Reduction (condensation)
If we shrink every strong component of a given graph into a vertex, deleteall loops and identify parallel arcs the obtained reduced graph is acyclic.For every acyclic graph an ordering / level function i : V → IN exists s.t.(u, v) ∈ A ⇒ i(u) < i(v).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 15'
&
$
%
Important vertices in networkIt seems that the most important distinction between different vertex indicesis based on the view/decision whether the network is considered directed orundirected. This gives us two main types of indices:
• directed case: measures of importance; with two subgroups: measuresof influence, based on out-going arcs; and measures of support, basedon incoming arcs;
• undirected case: measures of centrality, based on all lines.
For undirected networks all three types of measures coincide.
If we change the direction of all arcs (replace the relation with its inverserelation) the measure of influence becomes a measure of support, and viceversa.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 16'
&
$
%
. . . Important vertices in network
The real meaning of measure of importance depends on the relationdescribed by a network. For example the most ’important’ person for therelation ’ doesn’t like to work with ’ is in fact the least popular person.
Removal of an important vertex from a network produces a substantialchange in structure/functioning of the network.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 17'
&
$
%
Degrees
The simplest index are the degrees of vertices. Since for simple networksdegmin = 0 and degmax = n − 1, the corresponding normalized indicesare
centrality deg′(v) =deg(v)n− 1
and similary
support indeg′(v) =indeg(v)
n
influence outdeg′(v) =outdeg(v)
n
Instead of degrees in original network we can consider also the degrees withrespect to the reachability relation (transitive closure).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 18'
&
$
%
Closeness
Most indices are based on the distance d(u, v) between vertices in a networkN = (V,L). Two such indices are
radius r(v) = maxu∈V d(v, u)
total closeness S(v) =∑
u∈V d(v, u)
These two indices are measures of influence – to get measures of supportwe have to replace in definitions d(u, v) with d(v, u).
If the network is not strongly connected rmax and Smax equal∞. Sabidussi(1966) introduced a related measure 1/S(v); or in its normalized form
closeness cl(v) =n− 1∑
u∈V d(v, u)
D = maxu,v∈V d(v, u) is called the diameter of network.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 19'
&
$
%
Betweenness
Important are also the vertices that can control the information flow in thenetwork. If we assume that this flow uses only the shortest paths (geodesics)we get a measure of betweeness (Anthonisse 1971, Freeman 1977)
b(v) =1
(n− 1)(n− 2)
∑u,t∈V:gu,t>0u 6=v,t6=v,u6=t
gu,t(v)gu,t
where gu,t is the number of geodesics from u to t; and gu,t(v) is the numberof those among them that pass through vertex v.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 20'
&
$
%
Hubs and authoritiesTo each vertex v of a network N = (V,L) we assign two values: quality ofits content (authority) xv and quality of its references (hub) yv .
A good authority is selected by good hubs; and good hub points to goodauthorities (see Kleinberg).
xv =∑
u:(u,v)∈L
yu and yv =∑
u:(v,u)∈L
xu
Let W be a matrix of network N and x and y authority and hub vectors. Then wecan write these two relations as x = WT y and y = Wx.
We start with y = [1, 1, . . . , 1] and then compute new vectors x and y. After eachstep we normalize both vectors. We repeat this until they stabilize. We can showthat this procedure converges. The limit vector x∗ is the principal eigen vector ofmatrix WT W; and y∗ of matrix WWT .
Similar procedures are used in search engines on the web to evaluate the importance
of web pages.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 21'
&
$
%
Clustering coefficientLet G = (V, E) be simple undirected graph. Clustering in vertex v isusually measured as a quotient between the number of lines in subgraphG1(v) = G(N1(v)) induced by the neighbors of vertex v and the number oflines in the complete graph on these vertices:
C(v) =
2|L(G1(v))|
deg(v)(deg(v)− 1)deg(v) > 1
0 otherwise
We can consider also the size of vertex neighborhood by the followingcorrection
C1(v) =deg(v)
∆C(v)
where ∆ is the maximum degree in graph G. This measure attains its largestvalue in vertices that belong to an isolated clique of size ∆.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 22'
&
$
%
CutsThe standard approach to find interesting groups inside a network was basedon properties/weights – they can be measured or computed from networkstructure (for example Kleinberg’s hubs and authorities).
The vertex-cut of a network N = (V,L, p), p : V → IR, at selected level t
is a subnetwork N (t) = (V ′,L(V ′), p), determined by the set
V ′ = {v ∈ V : p(v) ≥ t}
and L(V ′) is the set of lines from L that have both endpoints in V ′.
The line-cut of a network N = (V,L, w), w : L → IR, at selected level t isa subnetwork N (t) = (V(L′),L′, w), determined by the set
L′ = {e ∈ L : w(e) ≥ t}
and V(L′) is the set of all endpoints of the lines from L′.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 23'
&
$
%
Citation weights
PFAFFELHUBER-E-1975-V18-P217
POGGIO-T-1975-V19-P201
KOHONEN-T-1976-V21-P85
KOHONEN-T-1976-V22-P159
AMARI-SI-1977-V26-P175
KOHONEN-T-1977-V2-P1065
ANDERSON-JA-1977-V84-P413
WOOD-CC-1978-V85-P582
COOPER-LN-1979-V33-P9
PALM-G-1980-V36-P19
AMARI-S-1980-V42-P339SUTTON-RS-1981-V88-P135
KOHONEN-T-1982-V43-P59
BIENENSTOCK-EL-1982-V2-P32
HOPFIELD-JJ-1982-V79-P2554
ANDERSON-JA-1983-V13-P799
KNAPP-AG-1984-V10-P616
MCCLELLAND-JL-1985-V114-P159
HECHTNIELSEN-R-1987-V26-P1892
HECHTNIELSEN-R-1987-V26-P4979
GROSSBERG-S-1987-V11-P23
CARPENTER-GA-1987-V37-P54
GROSSBERG-S-1988-V1-P17
HECHTNIELSEN-R-1988-V1-P131SEJNOWSKI-TJ-1988-V241-P1299
BROWN-TH-1988-V242-P724
BROWN-TH-1990-V13-P475
KOHONEN-T-1990-V78-P1464
TREVES-A-1991-V2-P371
HASSELMO-ME-1993-V16-P218
BARKAI-E-1994-V72-P659
HASSELMO-ME-1994-V14-P3898
HASSELMO-ME-1994-V7-P13
HASSELMO-ME-1995-V67-P1
HASSELMO-ME-1995-V15-P5249
GLUCK-MA-1997-V48-P481
ASHBY-FG-1999-V6-P363
Pajek
The citation network analysisstarted in 1964 with the paper ofGarfield et al. In 1989 Hummonand Doreian proposed threeindices – weights of arcs that areproportional to the number ofdifferent source-sink paths passingthrough the arc. We developedalgorithms to efficiently computethese indices.Main subnetwork (arc cut at level0.007) of the SOM (selforganizingmaps) citation network (4470 ver-tices, 12731 arcs).See paper.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 24'
&
$
%
IslandsIf we represent a given or computed value of vertices / lines as a height ofvertices / lines and we immerse the network into a water up to selected levelwe get islands. Varying the level we get different islands. Islands are verygeneral and efficient approach to determine the ’important’ subnetworks ina given network.
We developed very efficient algorithms to determine the islands hierarchyand to list all the islands of selected sizes.
See details.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 25'
&
$
%
. . . Islands
A set of vertices C ⊆ V is a regular vertex island in networkN = (V,L, p),p : V → IR iff it induces a connected subgraph and the vertices from theisland are ’higher’ than the neighboring vertices
maxu∈N(C)
p(u) < minv∈c
p(v)
A set of vertices C ⊆ V is a regular line island in network N = (V,L, w),w : L → IR iff it induces a connected subgraph and the lines inside theisland are ’stronger related’ among them than with the neighboring vertices– in N there exists a spanning tree T over C such that
max(u,v)∈L,u/∈C,v∈C
w(u, v) < min(u,v)∈T
w(u, v)
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 26'
&
$
%
Islands - Reuters terror news
110-storyact
action
afghanistan
africa
agent
aid
air
air_force
airline
airliner
airport
american
american_airlines
anthrax
anti-american
apparent
arab
arabic-language
attack
attendant
barksdale
base
bin_laden
boston
buildng
business
call
car
case
cell
center
chemical
cheyenne
chief
city
commercial
conference
congressionalcontain
country
deadly
death
debris
dept
dissident
district
east
edmund
edward
effort
embassy
emergency
exchange
fbi
financialfire
firefighter
flight
florida
force
group
headquarters
help
herald
hijack
hijacker
inhalejet
jonn
knife-wielding
landmark
large
late
leader
louisiana
man
manual
market
mayor
mayor_giuliani
member
mighty
military
miss
morning
nebraska
necessary
new_york
news
newspaper
north
nuclear
officerofficial
offutt
organization
pakistan
pakistani
passenger
pentagon
people
pfc
phone
pilot
plane
plant
plaugher
plea
police
postal
power
rental
rescue
responsibility
saudi
saudi-born
scare
service
skin
smoke
south
space
special
specialist
state
stock
strike
support
suspecttaliban
team
terror
terrorism
terroristthe_worst
thousand
thursday
toll
tower
trace
train
tuesday
twin
uniform
united_airlines
united_stateswar
washington
weapon
wednesdayweek
worker
world
world_trade_ctr
wyoming
Pajek
Using CRA S. Cormanand K. Dooley producedthe Reuters terror newsnetwork that is based onall stories released dur-ing 66 consecutive days bythe news agency Reutersconcerning the September11 attack on the US. Thevertices of a network arewords (terms); there is anedge between two wordsiff they appear in the sametext unit. The weight of anedge is its frequency. It hasn = 13332 vertices andm = 243447 edges.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 27'
&
$
%
Islands – The Edinburgh Associative Thesaurusn = 23219, m = 325624, transitivity weight
AGAIN ALREADY
ANYWAY
AS
BUT
HAPPEN
HAPPENED
JUST
MEANWHILE
NEVERTHELESS
NOTWITHSTANDING
NOW
OFTEN
SOMETIME
SOON
THEREFORE
WHY
YET
BELIEVE
COINS
COULD
DEALER
DEFICIT
INCREASE
LOOT
MONEY
MONIES
MORE MONEY
NO
NOT
OFFER
PAID
PAY
PAYMENT
PLEASE
PROBABLY
PROPERTY
PROVIDE
RECEIPT
REFUSE
REPAY
STOLE
THRIFT
THRIFTY
UNPAID
ACTIVITY
EDUCATION
ENGINEERING
HOMEWORK
LEARNING
LECTURER
LECTURES
LESSONS
MATHS
RESEARCH
SCHOOL
SCIENCE
SCIENTIFIC STUDY
STUDYING
TEACHER
TEACHING
TRAINING
WORK
ADORABLE
ATTRACTIVE
BEAUTIFUL
BELOVED
BOSS
CHAIRMAN
CHARM
DELIGHTFUL
ELEGANCE
FLIRT
GIRL
HAIRY
INHUMAN
KINDNESS
LOVE
LOVELY
MAN
MYSTERIOUS NICE
POWERFUL
PROFESSION
RESPONSIBLE
SHAPELY
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 28'
&
$
%
Dense groups, coresSeveral notions were proposed in attempts to formally describe densegroups in graphs.
Clique of order k is a maximal complete subgraph (isomorphic to Kk),k ≥ 3.
s-plexes, LS sets, lambda sets, cores, . . .
For all of them, except for cores, it turned out that they are difficult todetemine.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 29'
&
$
%
CoresThe notion of core was introducedby Seidman in 1983. Let G =(V, E) be a graph. A subgraphH =(W, E|W ) induced by the set W isa k-core or a core of order k iff∀v ∈ W : degH(v) ≥ k, and H isa maximal subgraph with this prop-erty. The core of maximum order isalso called the main core.
The core number of vertex v is the highest order of a core that containsthis vertex. The degree deg(v) can be: in-degree, out-degree, in-degree +out-degree, etc., determining different types of cores.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 30'
&
$
%
Properties of cores
From the figure, representing 0, 1, 2 and 3 core, we can see the followingproperties of cores:
• The cores are nested: i < j =⇒ Hj ⊆ Hi
• Cores are not necessarily connected subgraphs.
An efficient algorithm for determining the cores hierarchy is based on thefollowing property:
If from a given graph G = (V, E) we recursively delete all vertices,and edges incident with them, of degree less than k, the remaininggraph is the k-core.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 31'
&
$
%
6-core of Krebs Internet industries
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 32'
&
$
%
Generalized coresThe notion of core can be generalized to networks. Let N = (V, E , w)be a network, where G = (V, E) is a graph and w : E → IR is a functionassigning values to edges. A vertex property function on N, or a p-function for short, is a function p(v, U), v ∈ V , U ⊆ V with real values.Let NU (v) = N(v) ∩ U . Besides degrees, here are some examples ofp-functions:
pS(v, U) =∑
u∈NU (v)
w(v, u), where w : E → IR+0
pM (v, U) = maxu∈NU (v)
w(v, u), where w : E → IR
pk(v, U) = number of cycles of length k through vertex v in (U, E|U)
The subgraph H = (C, E|C) induced by the set C ⊆ V is a p-core at levelt ∈ IR iff ∀v ∈ C : t ≤ p(v, C) and C is a maximal such set.
For several p-functions an efficient algorithm for p-cores exists.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 33'
&
$
%
pS-core at level 46 of Computational geometry network
L.Guibas
M.Sharir
M.vanKreveld
B.Chazelle
J.Snoeyink
A.Garg
D.Dobkin
F.Preparata
J.Hershberger
C.Yap
J.Boissonnat
O.Schwarzkopf
J.Mitchell
M.Overmars
P.Gupta
R.Pollack
D.Eppstein
M.Goodrich
M.Bern
P.Agarwal
I.Tollis
H.Edelsbrunner
E.Arkin
R.Janardan
M.deBerg
D.Halperin
L.Vismara
M.Smid
G.Toussaint
M.Yvinec
M.Teillaud
S.Suri
R.Klein
E.Welzl
G.Liotta
J.Pach
P.Bose
J.Schwerdt
J.Majhi
J.Czyzowicz
R.Tamassia
B.AronovR.Seidel
J.Urrutia
J.Vitter
J.Matousek
C.Icking
J.O’Rourke
O.Devillers
G.diBattista
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 34'
&
$
%
Acyclic networks
v1v2
v3
v4
v5
v6
v7
v8
v9
v10
v11
acyclic.paj
Network G = (V, R) , R ⊆ V × Vis acyclic, if it doesn’t contain any(proper) cycle.
R ∩ I = ∅
In some cases we allow loops.Examples: citation networks, ge-nealogies, project networks, . . .In real-life acyclic networks weusually have a vertex property p :V → IR (most often time), that iscompatible with arcs
(u, v) ∈ R⇒ p(u) < p(v)
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 35'
&
$
%
Basic properties of acyclic networks
Let G = (V, R) be acyclic and U ⊆ V , then G|U = (U , R|U), R|U =R ∩ U × U is also acyclic.
Let G = (V, R) be acyclic, then G′ = (V, R−1) is also acyclic. Duality.
The set of sources MinR(V) = {v : ¬∃u ∈ V : (u, v) ∈ R} and the setof sinks MaxR(V) = {v : ¬∃u ∈ V : (v, u) ∈ R} are nonempty (in finitenetworks).
Transitive closure R of an acyclic relation R is acyclic.
Relation Q is a skeleton of relation R iff Q ⊆ R, Q = R and relation Q isminimal such relation – no arc can be deleted from it without destroying thesecond property.
A general relation (graph) can have several skeletons; but in a case ofacyclic relation it is uniquely determined Q = R \R ∗R.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 36'
&
$
%
Depth
v13
v25
v36
v42 v52
v66
v75
v81
v94
v101
v114
Compatible mapping h : V → IN+ iscalled depth or level if all differences onthe longest path and the initial value equalto 1.
U ← V; k ← 0while U 6= ∅ doT ← MinR(U); k ← k + 1for v ∈ T do h(v)← k
U ← U \ T
Drawing on levels. Macro Layers.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 37'
&
$
%
Compatible numberings
v14
v27
v311
v43 v58
v610
v79
v82
v96
v101
v115
Injective mapping h : V → 1..|V| compat-ible with relation R is called a compatiblenumbering.’Topological sort’
U ← V; k ← 0while U 6= ∅ do
select v ∈ MinR(U); k ← k + 1h(v)← k
U ← U \ {v}
Matrix display of acyclic network with vertices reordered according to acompatible numbering has a zero lower triangle.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 38'
&
$
%
. . . Compatible numberingsv1
v2
v3
v4
v5
v6
v7
v8
v9
v10
v11
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
v11
v10
v8
v4
v1
v11
v9
v2
v5
v7
v6
v3
v10
v8 v4 v1 v11
v9 v2 v5 v7 v6 v3
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 39'
&
$
%
Compatible numberings and functions on acyclic networks
Let the function f : V → IR be defined in the following way:
• f(v) is knownn in sources v ∈ MinR(V)
• f(v) = F ({f(u) : uRv})
If we compute the values of function f in a sequence determined by acomptible numbering we can compute them in one pass since for eachvertex v ∈ V the values of f needed for its computation are already known.
Example: the earliest time T (v) of completion of all tasks entering the statev in CPM (Critical Path Method).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 40'
&
$
%
Citation networks
The citation network analysisstarted in 1964 with the paper ofGarfield et al. In 1989 Hummonand Doreian proposed three indices– weights of arcs that provide uswith automatic way to identifythe (most) important part of thecitation network. For two of theseindices we developed algorithms toefficiently compute them.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 41'
&
$
%
. . . Citation networks
In a given set of units/vertices U (articles, books, works, etc.) we introducea citing relation/set of arcs R ⊆ U × U
uRv ≡ v cites u
which determines a citation network N = (U , R).
A citing relation is usually irreflexive (no loops) and (almost) acyclic. Weshall assume that it has these two properties. Since in real-life citationnetworks the strong components are small (usually 2 or 3 vertices) wecan transform such network into an acyclic network by shrinking strongcomponents and deleting loops. Other approaches exist. It is also useful totransform a citation network to its standardized form by adding a commonsource vertex s /∈ U and a common sink vertex t /∈ U . The source s islinked by an arc to all minimal elements of R; and all maximal elements ofR are linked to the sink t. We add also the ‘feedback’ arc (t, s).
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 42'
&
$
%
Search path count method
The search path count (SPC)method is based on counters n(u, v)that count the number of differ-ent paths from s to t through thearc (u, v). To compute n(u, v) weintroduce two auxiliary quantities:n−(v) counts the number of differ-ent paths from s to v, and n+(v)counts the number of different pathsfrom v to t.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 43'
&
$
%
Fast algorithm for SPC
It follows by basic principles of combinatorics that
n(u, v) = n−(u) · n+(v), (u, v) ∈ R
where
n−(u) ={
1 u = s∑v:vRu n−(v) otherwise
and
n+(u) ={
1 u = t∑v:uRv n+(v) otherwise
This is the basis of an efficient algorithm for computing n(u, v) – after thetopological sort of the graph we can compute, using the above relationsin topological order, the weights in time of order O(m), m = |R|. Thetopological order ensures that all the quantities in the right sides of theabove equalities are already computed when needed.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 44'
&
$
%
Hummon and Doreian indices and SPC
The Hummon and Doreian indices are defined as follows:
• search path link count (SPLC) method: wl(u, v) equals the numberof “all possible search paths through the network emanating from anorigin node” through the arc (u, v) ∈ R.
• search path node pair (SPNP) method: wp(u, v) “accounts for allconnected vertex pairs along the paths through the arc (u, v) ∈ R”.
We get the SPLC weights by applying the SPC method on the networkobtained from a given standardized network by linking the source s by anarc to each nonminimal vertex from U ; and the SPNP weights by applyingthe SPC method on the network obtained from the SPLC network byadditionally linking by an arc each nonmaximal vertex from U to the sink t.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 45'
&
$
%
Vertex weights
The quantities used to compute the arc weights w can be used also to definethe corresponding vertex weights t
tc(u) = n−(u) · n+(u)
tl(u) = n−l (u) · n+l (u)
tp(u) = n−p (u) · n+p (u)
They are counting the number of paths of selected type through the vertexu.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 46'
&
$
%
Properties of SPC weights
The values of counters n(u, v) form a flow in the citation network – theKirchoff’s vertex law holds: For every vertex u in a standardized citationnetwork incoming flow = outgoing flow:∑
v:vRu
n(v, u) =∑
v:uRv
n(u, v) = n−(u) · n+(u)
The weight n(t, s) equals to the total flow through network and provides anatural normalization of weights
w(u, v) =n(u, v)n(t, s)
⇒ 0 ≤ w(u, v) ≤ 1
and if C is a minimal arc-cut-set∑
(u,v)∈C w(u, v) = 1.
In large networks the values of weights can grow very large. This should beconsidered in the implementation of the algorithms.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 47'
&
$
%
Nonacyclic citation networksIf there is a cycle in a network then there is also an infinite number of trailsbetween some units. There are some standard approaches to overcome theproblem: to introduce some ’aging’ factor which makes the total weightof all trails converge to some finite value; or to restrict the definition of aweight to some finite subset of trails – for example paths or geodesics. But,new problems arise: What is the right value of the ’aging’ factor? Is therean efficient algorithm to count the restricted trails?
The other possibility, since a citation network is usually almost acyclic, isto transform it into an acyclic network
• by identification (shrinking) of cyclic groups (nontrivial strong compo-nents), or
• by deleting some arcs, or
• by transformations such as the ’preprint’ transformation.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 48'
&
$
%
Preprint transformationThe preprint transformationis based on the followingidea: Each paper from astrong component is du-plicated with its ’preprint’version. The papers in-side strong component citepreprints.
Large strong components in citation network are unlikely – their presenceusually indicates an error in the data. An exception from this rule is the HEPcitation network of High Energy Particle Physics literature from arXiv. Init different versions of the same paper are treated as a unit. This leads tolarge strongly connected components. The idea of preprint transformationcan be used also in this case to eliminate cycles.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 49'
&
$
%
GenealogiesAnother example of acyclic networks are genealogies. In ’Sources’ wealready described the following network representations of genealogies:
I wife
fathermother
sisterbrother
son daughter
f-grandfather f-grandmotherm-grandfatherm-grandmother
stepmother
son-in-lawdaughter-in-law
sister-in-law
grandson
sister
grandson
I & wife
father & mother
f-grandfather & f-grandmother m-grandfather & m-grandmother
father & stepmother
son-in-law & daughterson & daughter-in-law
brother & sister-in-law Iwife
father mother
sister brother
son daughter
f-grandfather f-grandmother m-grandfather m-grandmother
stepmother
son-in-lawdaughter-in-law
sister-in-law
grandson
I & wife
father & mother
f-grandfather & f-grandmother m-grandfather & m-grandmother
father & stepmother
son-in-law & daughterson & daughter-in-law
brother & sister-in-law
Ore graph, p-graph, and bipartite p-graph
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 50'
&
$
%
Properties of representations
p-graphs and bipartite p-graphs have many advantages:
• there are less vertices and lines in p-graphs than in corresponding Oregraphs;
• p-graphs are directed, acyclic networks;
• every semi-cycle of the p-graph corresponds to a relinking marriage.There exist two types of relinking marriages: blood marriage: e.g.,marriage among brother and sister, and non-blood marriage: e.g., twobrothers marry two sisters from another family.
• p-graphs are more suitable for analyses.
Bipartite p-graphs have an additional advantage: we can distinguishbetween a married uncle and a remarriage of a father. This property enablesus, for example, to find marriages between half-brothers and half-sisters.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 51'
&
$
%
Pattern searching
If a selected pattern determined by a given graph does not occur frequentlyin a sparse network the straightforward backtracking algorithm applied forpattern searching finds all appearences of the pattern very fast even in thecase of very large networks. Pattern searching was successfully applied tosearching for patterns of atoms in molecula (carbon rings) and searching forrelinking marriages in genealogies.
Damianus/Georgio/Legnussa/Babalio/
Marin/Gondola/Magdalena/Grede/
Nicolinus/Gondola/Franussa/Bona/
Marinus/Bona/Phylippa/Mence/
Sarachin/Bona/Nicoletta/Gondola/
Marinus/Zrieva/Maria/Ragnina/
Lorenzo/Ragnina/Slavussa/Mence/
Junius/Zrieva/Margarita/Bona/
Junius/Georgio/Anucla/Zrieva/
Michael/Zrieva/Francischa/Georgio/
Nicola/Ragnina/Nicoleta/Zrieva/
Three connected relinking marriages in thegenealogy (represented as a p-graph) of ra-gusan noble families. A solid arc indicatesthe is a son of relation, and a dotted arcindicates the is a daughter of relation.In all three patterns a brother and a sisterfrom one family found their partners in thesame other family.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 52'
&
$
%
. . . Pattern searching
To speed up the search or to consider some additional properties of thepattern, a user can set some additional options:
• vertices in network should match with vertices in pattern in somenominal, ordinal or numerical property (for example, type of atom inmolecula);
• values of edges must match (for example, edges representingmale/female links in the case of p-graphs);
• the first vertex in the pattern can be selected only from a given subsetof vertices in the network.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 53'
&
$
%
Relinking patterns in p-graphs
A3
A4.1B4 A4.2
A5.1A5.2
B5
A6.1A6.2
A6.3
C6
B6.1 B6.2 B6.3
B6.4
A2
All possible relinking marriages in p-graphs with 2 to 6 vertices. Patterns arelabeled as follows:
• first character – number of firstvertices: A – single, B – two, C– three.
• second character: number of ver-tices in pattern (2, 3, 4, 5, or 6).
• last character: identifier (if the twofirst characters are identical).
Patterns denoted by A are exactly theblood marriages. In every pattern thenumber of first vertices equals to thenumber of last vertices.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6
V. Batagelj, A. Ferligoj: Structure of Networks 54'
&
$
%
Frequencies normalized with number of couples inp-graph × 1000.
pattern Loka Silba Ragusa Turcs RoyalA2 0.07 0.00 0.00 0.00 0.00A3 0.07 0.00 0.00 0.00 2.64A4.1 0.85 2.26 1.50 159.71 18.45B4 3.82 11.28 10.49 98.28 6.15A4.2 0.00 0.00 0.00 0.00 0.00A5.1 0.64 3.16 2.00 36.86 11.42A5.2 0.00 0.00 0.00 0.00 0.00B5 1.34 4.96 23.48 46.68 7.03A6.1 1.98 12.63 1.00 169.53 11.42A6.2 0.00 0.90 0.00 0.00 0.88A6.3 0.00 0.00 0.00 0.00 0.00C6 0.71 5.41 9.49 36.86 4.39B6.1 0.00 0.45 1.00 0.00 0.00B6.2 1.91 17.59 31.47 130.22 10.54B6.3 3.32 13.53 40.96 113.02 11.42B6.4 0.00 0.00 2.50 7.37 0.00Sum 14.70 72.17 123.88 798.53 84.36
Most of the relinking marriages happened in the genealogy of Turkish nomads; the second is
Ragusa while in other genealogies they are much less frequent.
ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6