qmss: structure of networks - uni-lj.si

'

&

$

%

Konigsberg bridges

Structureof Networks

Vladimir BatageljAnuska Ferligoj

University of Ljubljana

European Science FoundationQMSS Workshop

Ljubljana, Friday, July 1, 2005, 9:00–12:30

version: June 29, 2005 / 05 : 00

V. Batagelj, A. Ferligoj: Structure of Networks 2'

&

$

%

Outline1 Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Clusters, clusterings, partitions, hierarchies . . . . . . . . . . . . . . . . . . . . 2

5 Subgraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

7 Triads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

9 Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

11 Equivalence relations and Partitions . . . . . . . . . . . . . . . . . . . . . . . . 11

12 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

15 Important vertices in network . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

22 Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

24 Islands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

28 Dense groups, cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

34 Acyclic networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

40 Citation networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

49 Genealogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

ESF, QMSS Workshop, Ljubljana, July 2005 s s y s l s y ss * 6


&

$

%

Degrees

degree of vertex v, deg(v) = numberof lines with v as end-vertex;indegree of vertex v, indeg(v) =number of lines with v as terminalvertex (end-vertex is both initial andterminal);outdegree of vertex v, outdeg(v) =number of lines with v as initial vertex.

n = 12, m = 23, indeg(e) = 3, outdeg(e) = 5, deg(e) = 6∑v∈V

indeg(v) =∑v∈V

outdeg(v) = |A|+ 2|E|,∑v∈V

deg(v) = 2|L| − |E0|



&

$

%

Clusters, clusterings, partitions, hierarchiesA nonempty subset C ⊆ V is called a cluster (group). A nonempty set ofclusters C = {Ci} forms a clustering.

Clustering C = {Ci} is a partition iff

∪C =⋃i

Ci = V in i 6= j ⇒ Ci ∩ Cj = ∅

Clustering C = {Ci} is a hierarchy iff

Ci ∩ Cj ∈ {∅, Ci, Cj}

Hierarchy C = {Ci} is complete, iff ∪C = V; and is basic if for allv ∈ ∪C also {v} ∈ C.



&

$

%

Contraction of cluster

Contraction of cluster C is called a graph G/C, in which all vertices of thecluster C are replaced by a single vertex, say c. More precisely:

G/C = (V ′,L′), where V ′ = (V \ C) ∪ {c} and L′ consists of linesfrom L that have both end-vertices in V \ C. Beside these it contains alsoa ’star’ with the center c and: arc (v, c), if ∃p ∈ L, u ∈ C : p(v, u);or arc (c, v), if ∃p ∈ L, u ∈ C : p(u, v). There is a loop (c, c) in c if∃p ∈ L, u, v ∈ C : p(u, v).

In a network over graph G we have also to specify how are determined thevalues/weights in the shrunk part of the network. Usually as the sum ormaksimum/minimum of the original values.



&

$

%

Contracted clusters – international tradePajek - shadow [0.00,1.00]

usacancubhaidomjamtrimexguahonelsniccospancolvenecuperbrabolparchiarguruukiirenetbelluxfraswispaporwgeegepolaushunczeitamatalbyuggrecypbulrumusrfinswenordenicemlisendahnaunirivoguiupvlibsieghatogcamniggabcarchdconzaiugakenburrwasomethsafmaamoralgtunliysudegyirnturirqsyrlebjorisrsauyemkuwafgchamontaikodkorjapindpakbrmsrinepthakmrlaovndvnrmlaphiinsautnze

usa

can

cub

hai

dom

jam

tri

mex

gua

hon

els

nic

cos

pan

col

ven

ecu

per

bra

bol

par

chi

arg

uru

uki

ire net

bel

lux

fra

swi

spa

por

wge

ege

pol

aus

hun

cze

ita mat

alb

yug

gre

cyp

bul

rum

usr

fin swe

nor

den

ice

mli

sen

dah

nau

nir

ivo

gui

upv

lib sie

gha

tog

cam

nig

gab

car

chd

con

zai

uga

ken

bur

rwa

som

eth

saf

maa

mor

alg

tun

liy sud

egy

irn tur

irq syr

leb

jor

isr

sau

yem

kuw

afg

cha

mon

tai

kod

kor

jap

ind

pak

brm

sri

nep

tha

kmr

lao

vnd

vnr

mla

phi

ins

aut

nze

N. America

L. America

S. America

EuropeAfrica

Asia

Australia

Snyder and Kick’s international trade. Matrix display of dense networks.

w(Ci, Cj) =n(Ci, Cj)

n(Ci) · n(Cj)


http://vlado.fmf.uni-lj.si/pub/networks/data/mix/SKtrade.zip


&

$

%

Subgraph

A subgraphH = (V ′,L′) of a given graph G = (V,L) is a graph which setof lines is a subset of set of lines of G, L′ ⊆ L, its vertex set is a subset ofset of vertices of G, V ′ ⊆ V , and it contains all end-vertices of L′.

A subgraph can be induced by a given subset of vertices or lines.



&

$

%

Cut-out – induced subgraph: Snyder and Kick – Africa

mli

sen

dah

nau

nir

ivo

gui

upv

lib

sie

gha

tog

cam

nig

gab

car

chd

con

zai

uga

ken

bur

rwa

som

eth

saf

maa

mor

alg

tun

liy

sud

egy



&

$

%

Triads

1 - 003

2 - 012

3 - 102

4 - 021D

5 - 021U

6 - 021C

7 - 111D

8 - 111U

9 - 030T

10 - 030C

11 - 201

12 - 120D

13 - 120U

14 - 120C

15 - 210

16 - 300

Let G = (V, R) be a simple di-rected graph without loops. A triadis a subgraph induced by a given setof three vertices.There are 16 nonisomorphic (typesof) triads. They can be partitionedinto three basic types:

• the null triad 003;

• dyadic triads 012 and 102; and

• connected triads: 111D, 201,210, 300, 021D, 111U, 120D,021U, 030T, 120U, 021C,030C and 120C.



&

$

%

Triadic spectrum

Moody

Several properties of agraph can be expressedin terms of its triadicspectrum – distributionof all its triads. It alsoprovides ingredients forp∗ network models. Adirect approach to de-termine the triadic spec-trum is of order O(n3);but in most large graphsit can be determinedmuch faster.


http://www.soc.sbs.ohio-state.edu/jwm/s884/notes/class_8.ppt

http://kentucky.psych.uiuc.edu/pstar/index.html


&

$

%

Walkslength |s| of the walk s is the numberof lines it contains.s = (j, h, l, g, e, f, h, l, e, c, b, a)|s| = 11A walk is closed iff its initial and ter-minal vertex coincide.If we don’t consider the direction of thelines in the walk we get a semiwalk orchain.trail – walk with all lines differentpath – walk with all vertices differentcycle – closed walk with all internalvertices different

A graph is acyclic if it doesn’t contain any cycle.



&

$

%

Shortest pathsA shortest path from u to v is alsocalled a geodesic from u to v. Itslength is denoted by d(u, v).If there is no walk from u to v thend(u, v) =∞.d(j, a) = |(j, h, d, c, b, a)| = 5d(a, j) =∞d(u, v) = max(d(u, v), d(v, u))is a distance:d(v, v) = 0, d(u, v) = d(v, u),d(u, v) ≤ d(u, t) + d(t, v).

The diameter of a graph equals to the distance between the most distant pairof vertices: D = maxu,v∈V d(u, v).



&

$

%

Equivalence relations and PartitionsA relation R on V is an equivalence relation iff it isreflexive ∀v ∈ V : vRv, symmetric ∀u, v ∈ V : uRv ⇒ vRu, andtransitive ∀u, v, z ∈ V : uRz ∧ zRv ⇒ uRv.

Each equivalence relation determines a partition into equivalence classes[v] = {u : vRu}.

Each partition C determines an equivalence relationuRv ⇔ ∃C ∈ C : u ∈ C ∧ v ∈ C.

k-neighbors of v is the set of vertices on ’distance’ k from v, Nk(v) ={u ∈ v : d(v, u) = k}.

The set of all k-neighbors, k = 0, 1, ... of v is a partition of V .

k-neighborhood of v, N (k)(v) = {u ∈ v : d(v, u) ≤ k}.



&

$

%

Connectivity

Vertex u is reachable from vertex v iffthere exists a walk with initial vertex v

and terminal vertex u.Vertex v is weakly connected with ver-tex u iff there exists a semiwalk with v

and u as its end-vertices.Vertex v is strongly connected with ver-tex u iff they are mutually reachable.

Weak and strong connectivity are equivalence relations.

Equivalence classes induce weak/strong components.



&

$

%

Weak components

Reordering the vertices of networksuch that the vertices from the sameclass of weak partition are put to-gether we get a matrix representa-tion consisting of diagonal blocks –weak components.Most problems can be solved sepa-rately on each component and after-ward these solutions combined intofinal solution.



&

$

%

Reduction (condensation)

If we shrink every strong component of a given graph into a vertex, deleteall loops and identify parallel arcs the obtained reduced graph is acyclic.For every acyclic graph an ordering / level function i : V → IN exists s.t.(u, v) ∈ A ⇒ i(u) < i(v).



&

$

%

Important vertices in networkIt seems that the most important distinction between different vertex indicesis based on the view/decision whether the network is considered directed orundirected. This gives us two main types of indices:

• directed case: measures of importance; with two subgroups: measuresof influence, based on out-going arcs; and measures of support, basedon incoming arcs;

• undirected case: measures of centrality, based on all lines.

For undirected networks all three types of measures coincide.

If we change the direction of all arcs (replace the relation with its inverserelation) the measure of influence becomes a measure of support, and viceversa.



&

$

%

. . . Important vertices in network

The real meaning of measure of importance depends on the relationdescribed by a network. For example the most ’important’ person for therelation ’ doesn’t like to work with ’ is in fact the least popular person.

Removal of an important vertex from a network produces a substantialchange in structure/functioning of the network.



&

$

%

Degrees

The simplest index are the degrees of vertices. Since for simple networksdegmin = 0 and degmax = n − 1, the corresponding normalized indicesare

centrality deg′(v) =deg(v)n− 1

and similary

support indeg′(v) =indeg(v)

n

influence outdeg′(v) =outdeg(v)

n

Instead of degrees in original network we can consider also the degrees withrespect to the reachability relation (transitive closure).



&

$

%

Closeness

Most indices are based on the distance d(u, v) between vertices in a networkN = (V,L). Two such indices are

radius r(v) = maxu∈V d(v, u)

total closeness S(v) =∑

u∈V d(v, u)

These two indices are measures of influence – to get measures of supportwe have to replace in definitions d(u, v) with d(v, u).

If the network is not strongly connected rmax and Smax equal∞. Sabidussi(1966) introduced a related measure 1/S(v); or in its normalized form

closeness cl(v) =n− 1∑

u∈V d(v, u)

D = maxu,v∈V d(v, u) is called the diameter of network.



&

$

%

Betweenness

Important are also the vertices that can control the information flow in thenetwork. If we assume that this flow uses only the shortest paths (geodesics)we get a measure of betweeness (Anthonisse 1971, Freeman 1977)

b(v) =1

(n− 1)(n− 2)

∑u,t∈V:gu,t>0u 6=v,t6=v,u6=t

gu,t(v)gu,t

where gu,t is the number of geodesics from u to t; and gu,t(v) is the numberof those among them that pass through vertex v.



&

$

%

Hubs and authoritiesTo each vertex v of a network N = (V,L) we assign two values: quality ofits content (authority) xv and quality of its references (hub) yv .

A good authority is selected by good hubs; and good hub points to goodauthorities (see Kleinberg).

xv =∑

u:(u,v)∈L

yu and yv =∑

u:(v,u)∈L

xu

Let W be a matrix of network N and x and y authority and hub vectors. Then wecan write these two relations as x = WT y and y = Wx.

We start with y = [1, 1, . . . , 1] and then compute new vectors x and y. After eachstep we normalize both vectors. We repeat this until they stabilize. We can showthat this procedure converges. The limit vector x∗ is the principal eigen vector ofmatrix WT W; and y∗ of matrix WWT .

Similar procedures are used in search engines on the web to evaluate the importance

of web pages.


http://citeseer.ist.psu.edu/kleinberg99authoritative.html


&

$

%

Clustering coefficientLet G = (V, E) be simple undirected graph. Clustering in vertex v isusually measured as a quotient between the number of lines in subgraphG1(v) = G(N1(v)) induced by the neighbors of vertex v and the number oflines in the complete graph on these vertices:

C(v) =

2|L(G1(v))|

deg(v)(deg(v)− 1)deg(v) > 1

0 otherwise

We can consider also the size of vertex neighborhood by the followingcorrection

C1(v) =deg(v)

∆C(v)

where ∆ is the maximum degree in graph G. This measure attains its largestvalue in vertices that belong to an isolated clique of size ∆.



&

$

%

CutsThe standard approach to find interesting groups inside a network was basedon properties/weights – they can be measured or computed from networkstructure (for example Kleinberg’s hubs and authorities).

The vertex-cut of a network N = (V,L, p), p : V → IR, at selected level t

is a subnetwork N (t) = (V ′,L(V ′), p), determined by the set

V ′ = {v ∈ V : p(v) ≥ t}

and L(V ′) is the set of lines from L that have both endpoints in V ′.

The line-cut of a network N = (V,L, w), w : L → IR, at selected level t isa subnetwork N (t) = (V(L′),L′, w), determined by the set

L′ = {e ∈ L : w(e) ≥ t}

and V(L′) is the set of all endpoints of the lines from L′.


http://www.cs.cornell.edu/home/kleinber/


&

$

%

Citation weights

PFAFFELHUBER-E-1975-V18-P217

POGGIO-T-1975-V19-P201

KOHONEN-T-1976-V21-P85


AMARI-SI-1977-V26-P175


ANDERSON-JA-1977-V84-P413

WOOD-CC-1978-V85-P582

COOPER-LN-1979-V33-P9

PALM-G-1980-V36-P19

AMARI-S-1980-V42-P339SUTTON-RS-1981-V88-P135


BIENENSTOCK-EL-1982-V2-P32

HOPFIELD-JJ-1982-V79-P2554

ANDERSON-JA-1983-V13-P799

KNAPP-AG-1984-V10-P616

MCCLELLAND-JL-1985-V114-P159

HECHTNIELSEN-R-1987-V26-P1892

HECHTNIELSEN-R-1987-V26-P4979

GROSSBERG-S-1987-V11-P23

CARPENTER-GA-1987-V37-P54

GROSSBERG-S-1988-V1-P17

HECHTNIELSEN-R-1988-V1-P131SEJNOWSKI-TJ-1988-V241-P1299

BROWN-TH-1988-V242-P724

BROWN-TH-1990-V13-P475


TREVES-A-1991-V2-P371

HASSELMO-ME-1993-V16-P218

BARKAI-E-1994-V72-P659





GLUCK-MA-1997-V48-P481

ASHBY-FG-1999-V6-P363

Pajek

The citation network analysisstarted in 1964 with the paper ofGarfield et al. In 1989 Hummonand Doreian proposed threeindices – weights of arcs that areproportional to the number ofdifferent source-sink paths passingthrough the arc. We developedalgorithms to efficiently computethese indices.Main subnetwork (arc cut at level0.007) of the SOM (selforganizingmaps) citation network (4470 ver-tices, 12731 arcs).See paper.


http://arxiv.org/abs/cs/0309023


&

$

%

IslandsIf we represent a given or computed value of vertices / lines as a height ofvertices / lines and we immerse the network into a water up to selected levelwe get islands. Varying the level we get different islands. Islands are verygeneral and efficient approach to determine the ’important’ subnetworks ina given network.

We developed very efficient algorithms to determine the islands hierarchyand to list all the islands of selected sizes.

See details.


http://vlado.fmf.uni-lj.si/pub/networks/doc/mix/islandsUK.pdf


&

$

%

. . . Islands

A set of vertices C ⊆ V is a regular vertex island in networkN = (V,L, p),p : V → IR iff it induces a connected subgraph and the vertices from theisland are ’higher’ than the neighboring vertices

maxu∈N(C)

p(u) < minv∈c

p(v)

A set of vertices C ⊆ V is a regular line island in network N = (V,L, w),w : L → IR iff it induces a connected subgraph and the lines inside theisland are ’stronger related’ among them than with the neighboring vertices– in N there exists a spanning tree T over C such that

max(u,v)∈L,u/∈C,v∈C

w(u, v) < min(u,v)∈T

w(u, v)



&

$

%

Islands - Reuters terror news

110-storyact

action

afghanistan

africa

agent

aid

air

air_force

airline

airliner

airport

american

american_airlines

anthrax

anti-american

apparent

arab

arabic-language

attack

attendant

barksdale

base

bin_laden

boston

buildng

business

call

car

case

cell

center

chemical

cheyenne

chief

city

commercial

conference

congressionalcontain

country

deadly

death

debris

dept

dissident

district

east

edmund

edward

effort

embassy

emergency

exchange

fbi

financialfire

firefighter

flight

florida

force

group

headquarters

help

herald

hijack

hijacker

inhalejet

jonn

knife-wielding

landmark

large

late

leader

louisiana

man

manual

market

mayor

mayor_giuliani

member

mighty

military

miss

morning

nebraska

necessary

new_york

news

newspaper

north

nuclear

officerofficial

offutt

organization

pakistan

pakistani

passenger

pentagon

people

pfc

phone

pilot

plane

plant

plaugher

plea

police

postal

power

rental

rescue

responsibility

saudi

saudi-born

scare

service

skin

smoke

south

space

special

specialist

state

stock

strike

support

suspecttaliban

team

terror

terrorism

terroristthe_worst

thousand

thursday

toll

tower

trace

train

tuesday

twin

uniform

united_airlines

united_stateswar

washington

weapon

wednesdayweek

worker

world

world_trade_ctr

wyoming

Pajek

Using CRA S. Cormanand K. Dooley producedthe Reuters terror newsnetwork that is based onall stories released dur-ing 66 consecutive days bythe news agency Reutersconcerning the September11 attack on the US. Thevertices of a network arewords (terms); there is anedge between two wordsiff they appear in the sametext unit. The weight of anedge is its frequency. It hasn = 13332 vertices andm = 243447 edges.


http://locks.asu.edu/terror/


&

$

%

Islands – The Edinburgh Associative Thesaurusn = 23219, m = 325624, transitivity weight

AGAIN ALREADY

ANYWAY

AS

BUT

HAPPEN

HAPPENED

JUST

MEANWHILE

NEVERTHELESS

NOTWITHSTANDING

NOW

OFTEN

SOMETIME

SOON

THEREFORE

WHY

YET

BELIEVE

COINS

COULD

DEALER

DEFICIT

INCREASE

LOOT

MONEY

MONIES

MORE MONEY

NO

NOT

OFFER

PAID

PAY

PAYMENT

PLEASE

PROBABLY

PROPERTY

PROVIDE

RECEIPT

REFUSE

REPAY

STOLE

THRIFT

THRIFTY

UNPAID

ACTIVITY

EDUCATION

ENGINEERING

HOMEWORK

LEARNING

LECTURER

LECTURES

LESSONS

MATHS

RESEARCH

SCHOOL

SCIENCE

SCIENTIFIC STUDY

STUDYING

TEACHER

TEACHING

TRAINING

WORK

ADORABLE

ATTRACTIVE

BEAUTIFUL

BELOVED

BOSS

CHAIRMAN

CHARM

DELIGHTFUL

ELEGANCE

FLIRT

GIRL

HAIRY

INHUMAN

KINDNESS

LOVE

LOVELY

MAN

MYSTERIOUS NICE

POWERFUL

PROFESSION

RESPONSIBLE

SHAPELY



&

$

%

Dense groups, coresSeveral notions were proposed in attempts to formally describe densegroups in graphs.

Clique of order k is a maximal complete subgraph (isomorphic to Kk),k ≥ 3.

s-plexes, LS sets, lambda sets, cores, . . .

For all of them, except for cores, it turned out that they are difficult todetemine.



&

$

%

CoresThe notion of core was introducedby Seidman in 1983. Let G =(V, E) be a graph. A subgraphH =(W, E|W ) induced by the set W isa k-core or a core of order k iff∀v ∈ W : degH(v) ≥ k, and H isa maximal subgraph with this prop-erty. The core of maximum order isalso called the main core.

The core number of vertex v is the highest order of a core that containsthis vertex. The degree deg(v) can be: in-degree, out-degree, in-degree +out-degree, etc., determining different types of cores.



&

$

%

Properties of cores

From the figure, representing 0, 1, 2 and 3 core, we can see the followingproperties of cores:

• The cores are nested: i < j =⇒ Hj ⊆ Hi

• Cores are not necessarily connected subgraphs.

An efficient algorithm for determining the cores hierarchy is based on thefollowing property:

If from a given graph G = (V, E) we recursively delete all vertices,and edges incident with them, of degree less than k, the remaininggraph is the k-core.



&

$

%

6-core of Krebs Internet industries



&

$

%

Generalized coresThe notion of core can be generalized to networks. Let N = (V, E , w)be a network, where G = (V, E) is a graph and w : E → IR is a functionassigning values to edges. A vertex property function on N, or a p-function for short, is a function p(v, U), v ∈ V , U ⊆ V with real values.Let NU (v) = N(v) ∩ U . Besides degrees, here are some examples ofp-functions:

pS(v, U) =∑

u∈NU (v)

w(v, u), where w : E → IR+0

pM (v, U) = maxu∈NU (v)

w(v, u), where w : E → IR

pk(v, U) = number of cycles of length k through vertex v in (U, E|U)

The subgraph H = (C, E|C) induced by the set C ⊆ V is a p-core at levelt ∈ IR iff ∀v ∈ C : t ≤ p(v, C) and C is a maximal such set.

For several p-functions an efficient algorithm for p-cores exists.



&

$

%

pS-core at level 46 of Computational geometry network

L.Guibas

M.Sharir

M.vanKreveld

B.Chazelle

J.Snoeyink

A.Garg

D.Dobkin

F.Preparata

J.Hershberger

C.Yap

J.Boissonnat

O.Schwarzkopf

J.Mitchell

M.Overmars

P.Gupta

R.Pollack

D.Eppstein

M.Goodrich

M.Bern

P.Agarwal

I.Tollis

H.Edelsbrunner

E.Arkin

R.Janardan

M.deBerg

D.Halperin

L.Vismara

M.Smid

G.Toussaint

M.Yvinec

M.Teillaud

S.Suri

R.Klein

E.Welzl

G.Liotta

J.Pach

P.Bose

J.Schwerdt

J.Majhi

J.Czyzowicz

R.Tamassia

B.AronovR.Seidel

J.Urrutia

J.Vitter

J.Matousek

C.Icking

J.O’Rourke

O.Devillers

G.diBattista



&

$

%

Acyclic networks

v1v2

v3

v4

v5

v6

v7

v8

v9

v10

v11

acyclic.paj

Network G = (V, R) , R ⊆ V × Vis acyclic, if it doesn’t contain any(proper) cycle.

R ∩ I = ∅

In some cases we allow loops.Examples: citation networks, ge-nealogies, project networks, . . .In real-life acyclic networks weusually have a vertex property p :V → IR (most often time), that iscompatible with arcs

(u, v) ∈ R⇒ p(u) < p(v)


http://vlado.fmf.uni-lj.si/vlado/podstat/AO/net/acyclic.paj


&

$

%

Basic properties of acyclic networks

Let G = (V, R) be acyclic and U ⊆ V , then G|U = (U , R|U), R|U =R ∩ U × U is also acyclic.

Let G = (V, R) be acyclic, then G′ = (V, R−1) is also acyclic. Duality.

The set of sources MinR(V) = {v : ¬∃u ∈ V : (u, v) ∈ R} and the setof sinks MaxR(V) = {v : ¬∃u ∈ V : (v, u) ∈ R} are nonempty (in finitenetworks).

Transitive closure R of an acyclic relation R is acyclic.

Relation Q is a skeleton of relation R iff Q ⊆ R, Q = R and relation Q isminimal such relation – no arc can be deleted from it without destroying thesecond property.

A general relation (graph) can have several skeletons; but in a case ofacyclic relation it is uniquely determined Q = R \R ∗R.



&

$

%

Depth

v13

v25

v36

v42 v52

v66

v75

v81

v94

v101

v114

Compatible mapping h : V → IN+ iscalled depth or level if all differences onthe longest path and the initial value equalto 1.

U ← V; k ← 0while U 6= ∅ doT ← MinR(U); k ← k + 1for v ∈ T do h(v)← k

U ← U \ T

Drawing on levels. Macro Layers.



&

$

%

Compatible numberings

v14

v27

v311

v43 v58

v610

v79

v82

v96

v101

v115

Injective mapping h : V → 1..|V| compat-ible with relation R is called a compatiblenumbering.’Topological sort’

U ← V; k ← 0while U 6= ∅ do

select v ∈ MinR(U); k ← k + 1h(v)← k

U ← U \ {v}

Matrix display of acyclic network with vertices reordered according to acompatible numbering has a zero lower triangle.



&

$

%

. . . Compatible numberingsv1

v2

v3

v4

v5

v6

v7

v8

v9

v10

v11

v1 v2 v3 v4 v5 v6 v7 v8 v9 v10

v11

v10

v8

v4

v1

v11

v9

v2

v5

v7

v6

v3

v10

v8 v4 v1 v11

v9 v2 v5 v7 v6 v3



&

$

%

Compatible numberings and functions on acyclic networks

Let the function f : V → IR be defined in the following way:

• f(v) is knownn in sources v ∈ MinR(V)

• f(v) = F ({f(u) : uRv})

If we compute the values of function f in a sequence determined by acomptible numbering we can compute them in one pass since for eachvertex v ∈ V the values of f needed for its computation are already known.

Example: the earliest time T (v) of completion of all tasks entering the statev in CPM (Critical Path Method).



&

$

%

Citation networks

The citation network analysisstarted in 1964 with the paper ofGarfield et al. In 1989 Hummonand Doreian proposed three indices– weights of arcs that provide uswith automatic way to identifythe (most) important part of thecitation network. For two of theseindices we developed algorithms toefficiently compute them.


http://www.garfield.library.upenn.edu/papers/




&

$

%

. . . Citation networks

In a given set of units/vertices U (articles, books, works, etc.) we introducea citing relation/set of arcs R ⊆ U × U

uRv ≡ v cites u

which determines a citation network N = (U , R).

A citing relation is usually irreflexive (no loops) and (almost) acyclic. Weshall assume that it has these two properties. Since in real-life citationnetworks the strong components are small (usually 2 or 3 vertices) wecan transform such network into an acyclic network by shrinking strongcomponents and deleting loops. Other approaches exist. It is also useful totransform a citation network to its standardized form by adding a commonsource vertex s /∈ U and a common sink vertex t /∈ U . The source s islinked by an arc to all minimal elements of R; and all maximal elements ofR are linked to the sink t. We add also the ‘feedback’ arc (t, s).



&

$

%

Search path count method

The search path count (SPC)method is based on counters n(u, v)that count the number of differ-ent paths from s to t through thearc (u, v). To compute n(u, v) weintroduce two auxiliary quantities:n−(v) counts the number of differ-ent paths from s to v, and n+(v)counts the number of different pathsfrom v to t.



&

$

%

Fast algorithm for SPC

It follows by basic principles of combinatorics that

n(u, v) = n−(u) · n+(v), (u, v) ∈ R

where

n−(u) ={

1 u = s∑v:vRu n−(v) otherwise

and

n+(u) ={

1 u = t∑v:uRv n+(v) otherwise

This is the basis of an efficient algorithm for computing n(u, v) – after thetopological sort of the graph we can compute, using the above relationsin topological order, the weights in time of order O(m), m = |R|. Thetopological order ensures that all the quantities in the right sides of theabove equalities are already computed when needed.



&

$

%

Hummon and Doreian indices and SPC

The Hummon and Doreian indices are defined as follows:

• search path link count (SPLC) method: wl(u, v) equals the numberof “all possible search paths through the network emanating from anorigin node” through the arc (u, v) ∈ R.

• search path node pair (SPNP) method: wp(u, v) “accounts for allconnected vertex pairs along the paths through the arc (u, v) ∈ R”.

We get the SPLC weights by applying the SPC method on the networkobtained from a given standardized network by linking the source s by anarc to each nonminimal vertex from U ; and the SPNP weights by applyingthe SPC method on the network obtained from the SPLC network byadditionally linking by an arc each nonmaximal vertex from U to the sink t.



&

$

%

Vertex weights

The quantities used to compute the arc weights w can be used also to definethe corresponding vertex weights t

tc(u) = n−(u) · n+(u)

tl(u) = n−l (u) · n+l (u)

tp(u) = n−p (u) · n+p (u)

They are counting the number of paths of selected type through the vertexu.



&

$

%

Properties of SPC weights

The values of counters n(u, v) form a flow in the citation network – theKirchoff’s vertex law holds: For every vertex u in a standardized citationnetwork incoming flow = outgoing flow:∑

v:vRu

n(v, u) =∑

v:uRv

n(u, v) = n−(u) · n+(u)

The weight n(t, s) equals to the total flow through network and provides anatural normalization of weights

w(u, v) =n(u, v)n(t, s)

⇒ 0 ≤ w(u, v) ≤ 1

and if C is a minimal arc-cut-set∑

(u,v)∈C w(u, v) = 1.

In large networks the values of weights can grow very large. This should beconsidered in the implementation of the algorithms.



&

$

%

Nonacyclic citation networksIf there is a cycle in a network then there is also an infinite number of trailsbetween some units. There are some standard approaches to overcome theproblem: to introduce some ’aging’ factor which makes the total weightof all trails converge to some finite value; or to restrict the definition of aweight to some finite subset of trails – for example paths or geodesics. But,new problems arise: What is the right value of the ’aging’ factor? Is therean efficient algorithm to count the restricted trails?

The other possibility, since a citation network is usually almost acyclic, isto transform it into an acyclic network

• by identification (shrinking) of cyclic groups (nontrivial strong compo-nents), or

• by deleting some arcs, or

• by transformations such as the ’preprint’ transformation.



&

$

%

Preprint transformationThe preprint transformationis based on the followingidea: Each paper from astrong component is du-plicated with its ’preprint’version. The papers in-side strong component citepreprints.

Large strong components in citation network are unlikely – their presenceusually indicates an error in the data. An exception from this rule is the HEPcitation network of High Energy Particle Physics literature from arXiv. Init different versions of the same paper are treated as a unit. This leads tolarge strongly connected components. The idea of preprint transformationcan be used also in this case to eliminate cycles.


http://www.cs.cornell.edu/projects/kddcup/index.html

http://arxiv.org/


&

$

%

GenealogiesAnother example of acyclic networks are genealogies. In ’Sources’ wealready described the following network representations of genealogies:

I wife

fathermother

sisterbrother

son daughter

f-grandfather f-grandmotherm-grandfatherm-grandmother

stepmother

son-in-lawdaughter-in-law

sister-in-law

grandson

sister

grandson

I & wife

father & mother

f-grandfather & f-grandmother m-grandfather & m-grandmother

father & stepmother

son-in-law & daughterson & daughter-in-law

brother & sister-in-law Iwife

father mother

sister brother

son daughter

f-grandfather f-grandmother m-grandfather m-grandmother

stepmother

son-in-lawdaughter-in-law

sister-in-law

grandson

I & wife

father & mother

f-grandfather & f-grandmother m-grandfather & m-grandmother

father & stepmother

son-in-law & daughterson & daughter-in-law

brother & sister-in-law

Ore graph, p-graph, and bipartite p-graph



&

$

%

Properties of representations

p-graphs and bipartite p-graphs have many advantages:

• there are less vertices and lines in p-graphs than in corresponding Oregraphs;

• p-graphs are directed, acyclic networks;

• every semi-cycle of the p-graph corresponds to a relinking marriage.There exist two types of relinking marriages: blood marriage: e.g.,marriage among brother and sister, and non-blood marriage: e.g., twobrothers marry two sisters from another family.

• p-graphs are more suitable for analyses.

Bipartite p-graphs have an additional advantage: we can distinguishbetween a married uncle and a remarriage of a father. This property enablesus, for example, to find marriages between half-brothers and half-sisters.



&

$

%

Pattern searching

If a selected pattern determined by a given graph does not occur frequentlyin a sparse network the straightforward backtracking algorithm applied forpattern searching finds all appearences of the pattern very fast even in thecase of very large networks. Pattern searching was successfully applied tosearching for patterns of atoms in molecula (carbon rings) and searching forrelinking marriages in genealogies.

Damianus/Georgio/Legnussa/Babalio/

Marin/Gondola/Magdalena/Grede/

Nicolinus/Gondola/Franussa/Bona/

Marinus/Bona/Phylippa/Mence/

Sarachin/Bona/Nicoletta/Gondola/

Marinus/Zrieva/Maria/Ragnina/

Lorenzo/Ragnina/Slavussa/Mence/

Junius/Zrieva/Margarita/Bona/

Junius/Georgio/Anucla/Zrieva/

Michael/Zrieva/Francischa/Georgio/

Nicola/Ragnina/Nicoleta/Zrieva/

Three connected relinking marriages in thegenealogy (represented as a p-graph) of ra-gusan noble families. A solid arc indicatesthe is a son of relation, and a dotted arcindicates the is a daughter of relation.In all three patterns a brother and a sisterfrom one family found their partners in thesame other family.



&

$

%

. . . Pattern searching

To speed up the search or to consider some additional properties of thepattern, a user can set some additional options:

• vertices in network should match with vertices in pattern in somenominal, ordinal or numerical property (for example, type of atom inmolecula);

• values of edges must match (for example, edges representingmale/female links in the case of p-graphs);

• the first vertex in the pattern can be selected only from a given subsetof vertices in the network.


http://eclectic.ss.uci.edu/~drwhite/pgraph/p-graphs.html


&

$

%

Relinking patterns in p-graphs

A3

A4.1B4 A4.2

A5.1A5.2

B5

A6.1A6.2

A6.3

C6

B6.1 B6.2 B6.3

B6.4

A2

All possible relinking marriages in p-graphs with 2 to 6 vertices. Patterns arelabeled as follows:

• first character – number of firstvertices: A – single, B – two, C– three.

• second character: number of ver-tices in pattern (2, 3, 4, 5, or 6).

• last character: identifier (if the twofirst characters are identical).

Patterns denoted by A are exactly theblood marriages. In every pattern thenumber of first vertices equals to thenumber of last vertices.



&

$

%

Frequencies normalized with number of couples inp-graph × 1000.

pattern Loka Silba Ragusa Turcs RoyalA2 0.07 0.00 0.00 0.00 0.00A3 0.07 0.00 0.00 0.00 2.64A4.1 0.85 2.26 1.50 159.71 18.45B4 3.82 11.28 10.49 98.28 6.15A4.2 0.00 0.00 0.00 0.00 0.00A5.1 0.64 3.16 2.00 36.86 11.42A5.2 0.00 0.00 0.00 0.00 0.00B5 1.34 4.96 23.48 46.68 7.03A6.1 1.98 12.63 1.00 169.53 11.42A6.2 0.00 0.90 0.00 0.00 0.88A6.3 0.00 0.00 0.00 0.00 0.00C6 0.71 5.41 9.49 36.86 4.39B6.1 0.00 0.45 1.00 0.00 0.00B6.2 1.91 17.59 31.47 130.22 10.54B6.3 3.32 13.53 40.96 113.02 11.42B6.4 0.00 0.00 2.50 7.37 0.00Sum 14.70 72.17 123.88 798.53 84.36

Most of the relinking marriages happened in the genealogy of Turkish nomads; the second is

Ragusa while in other genealogies they are much less frequent.


http://vlado.fmf.uni-lj.si/pub/networks/data/esna/ragusa.htm

http://vlado.fmf.uni-lj.si/pub/networks/data/GED/P-Tur.GED

http://vlado.fmf.uni-lj.si/pub/networks/pajek/GED/royal/Royal92.zip

qmss: structure of networks - uni-lj.si

Documents