network analysis of texts / 1....

44
Mark Lombardi ISEG Technical University of Lisbon Introductory Workshop to Network Analysis of Texts Networks Vladimir Batagelj University of Ljubljana Lisbon, Portugal: 2nd to 5th February 2004 organized by SOCIUS – Research Centre on Economic Sociology and the Sociology of Organisations version: 15. februar 2004 02 : 27

Upload: others

Post on 10-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

'

&

$

%

Mark Lombardi

ISEGTechnical University of Lisbon

Introductory Workshop toNetwork Analysis of Texts

Networks

Vladimir Batagelj

University of Ljubljana

Lisbon, Portugal: 2nd to 5th February 2004

organized bySOCIUS– Research Centre on Economic Sociology and the Sociology of Organisations

version: 15. februar 2004 02 : 27

Page 2: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 1'

&

$

%

Outline1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3 Roman roads (Peutinger) . . . . . . . . . . . . . . . . . . . . . 3

4 Moreno: Who shall survive?. . . . . . . . . . . . . . . . . . . 4

5 Development of DNA (Garfield) . . . . . . . . . . . . . . . . . 5

6 Organic molecule3CRO . . . . . . . . . . . . . . . . . . . . . 6

7 Mind maps (Buzan, Russell) . . . . . . . . . . . . . . . . . . . 7

8 Hijackers (Krebs) . . . . . . . . . . . . . . . . . . . . . . . . . 8

9 Wall Street Follies . . . . . . . . . . . . . . . . . . . . . . . . 9

10 They Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

11 Lombardi’s networks . . . . . . . . . . . . . . . . . . . . . . . 11

12 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 3: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 2'

&

$

%

13 Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13

14 Graph / Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

15 Graph / Neighbors . . . . . . . . . . . . . . . . . . . . . . . . 15

16 Graph / Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 16

17 Special graphs – path, cycle, star, complete. . . . . . . . . . . 17

18 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

20 Size of network . . . . . . . . . . . . . . . . . . . . . . . . . . 20

21 Temporal networks. . . . . . . . . . . . . . . . . . . . . . . . 21

25 Two-mode networks . . . . . . . . . . . . . . . . . . . . . . . 25

27 What is missing? . . . . . . . . . . . . . . . . . . . . . . . . . 27

28 How to get a network? . . . . . . . . . . . . . . . . . . . . . . 28

29 Use of existing network data. . . . . . . . . . . . . . . . . . . 29

30 Approaches to computer-assisted text analysis. . . . . . . . . . 30

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 4: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 3'

&

$

%

33 Dictionary networks. . . . . . . . . . . . . . . . . . . . . . . . 33

34 Citation networks. . . . . . . . . . . . . . . . . . . . . . . . . 34

35 Collaboration networks. . . . . . . . . . . . . . . . . . . . . . 35

37 Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37

38 Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . 38

39 Networks from the Internet. . . . . . . . . . . . . . . . . . . . 39

40 Wrappers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 5: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 1'

&

$

%

IntroductionWe (Andrej Mrvar and me) started to developPajek–

program for analysis and visualization of large networks

in 1996.

In 2002 we published two papers on network analysis of texts:Network

Analysis of DictionariesandNetwork Analysis of Texts.

They motivated Marta Pedro Varanda to propose me to have a workshop on

network analysis of texts. Thank you very much Marta.

The result is here.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 6: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 2'

&

$

%

ProgramMonday to Thursday – theory: 9.30 - 13h; practice: 14.30-17h

Content of the course:Day 1. Introduction to graphs, networks and Pajek. Obtaining networks

from textual data.

Day 2. Structure of networks (statistical characteristics, components, cores,

islands, ...)

Day 3. Important elements in networks (indices and weights). Acyclic

networks.

Day 4. Clustering and blockmodeling. Patterns searching.

In the afternoon the concepts from the morning lectures will be applied in

the analysis of real-life networks obtained from textual data.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 7: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 3'

&

$

%

Roman roads (Peutinger)

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 8: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 4'

&

$

%

Moreno: Who shall survive?

K: 1: 2:

3: 4: 5:

6: 7: 8:

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 9: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 5'

&

$

%

Development of DNA (Garfield)

In 1964 E. Garfield with collabora-

tors produced, on the basis of the

book Asimov I.:The Genetic Code

(1963), a corresponding ’citation’

network. It was shown that the anal-

ysis ’demonstrated a high degree of

coincidence between an historian’s

account of events and the citational

relationship between these events’.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 10: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 6'

&

$

%

Organic molecule3CRO

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 11: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 7'

&

$

%

Mind maps (Buzan, Russell)

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 12: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 8'

&

$

%

Hijackers (Krebs)Mapping Networks of Terrorist Cells / Krebs50

Figure 4. Hijacker’s Network Neighborhood

This dense under-layer of prior trusted relationships made the hijacker network both stealth andresilient. Although we don’t know all of the internal ties of the hijackers’ network it appears that manyof the ties were concentrated around the pilots. This is a risky move for a covert network. Concen-trating both unique skills and connectivity in the same nodes makes the network easier to disrupt –once it is discovered. Peter Klerks (Klerks 2001) makes an excellent argument for targeting those nodesin the network that have unique skills. By removing those necessary skills from the project, we caninflict maximum damage to the project mission and goals. It is possible that those with unique skillswould also have unique ties within the network. Because of their unique human capital and their highsocial capital the pilots were the richest targets for removal from the network. Unfortunately they werenot discovered in time.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 13: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 9'

&

$

%

Wall Street Follies

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 14: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 10'

&

$

%

They Rule

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 15: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 11'

&

$

%

Lombardi’s networks

Mark Lombardi(1951-2000)

transformed business

relations into art.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 16: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 12'

&

$

%

NetworksA networkis based on two sets – set ofvertices(nodes), that represent the

selectedunits, and set oflinks (lines), that representrelationsbetween units.

They determine agraph.

A link can bedirected– anarc, or undirected– anedge.

Additional data about vertices or links can be known – theirproperties

(attributes). For example: name/label, type, value, . . .

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 17: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 13'

&

$

%

Graph

actor – vertex, node

relation – link, edge, arc, line,

tie

arc = directed link,(a, d)a is theinitial vertex,

d is theterminalvertex.

edge= undirected link,(c: d)c andd areendvertices.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 18: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 14'

&

$

%

Graph / Sets

V = a, b, c, d, e, f, g, h, i, j, k, l

A = (a, b), (a, d), (a, f), (b, a),

(b, f), (c, b), (c, c), (c, g),

(c, g), (e, c), (e, f), (e, h),

(f, k), (h, d), (h, l), (j, h),

(l, e), (l, g), (l, h)

E = (b: e), (c: d), (e: g), (f : h)

G = (V, A, E)

L = A ∪ E

A = ∅ – undirectedgraph;E = ∅ – directedgraph.

Pajek: local:GraphSet ; TinaSet ;WWW: GraphSet / net ; TinaSet / net , picturepicture .

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 19: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 15'

&

$

%

Graph / Neighbors

NA(a) = b, d, fNA(b) = a, fNA(c) = b, c, g, gNA(e) = c, f, hNA(f) = kNA(h) = d, lNA(j) = hNA(l) = e, g, h

NE(e) = b, gNE(c) = dNE(f) = h

Pajek: local:GraphList ; TinaList ;WWW: GraphList / net ; TinaList / net .

N(v) = NA(v) ∪NE(v), also Nout(v), Nin(v)

Star in v, S(v) is the set of all lines withv as their initial vertex.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 20: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 16'

&

$

%

Graph / Matrixa b c d e f g h i j k l

a 0 1 0 1 0 1 0 0 0 0 0 0

b 1 0 0 0 1 1 0 0 0 0 0 0

c 0 1 1 1 0 0 2 0 0 0 0 0

d 0 0 1 0 0 0 0 0 0 0 0 0

e 0 1 1 0 0 1 1 1 0 0 0 0

f 0 0 0 0 0 0 0 1 0 0 1 0

g 0 0 0 0 1 0 0 0 0 0 0 0

h 0 0 0 1 0 1 0 0 0 0 0 1

i 0 0 0 0 0 0 0 0 0 0 0 0

j 0 0 0 0 0 0 0 1 0 0 0 0

k 0 0 0 0 0 0 0 0 0 0 0 0

l 0 0 0 0 1 0 1 1 0 0 0 0

Pajek: local:GraphMat ; TinaMat , picturepicture ;

WWW: GraphMat / net ; TinaMat / net , paj .

GraphG is simpleif in the corresponding matrix all entries are0 or 1.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 21: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 17'

&

$

%

Special graphs – path, cycle, star, complete

Graphs:pathP5, cycleC7, star S8 in complete graphK7.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 22: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 18'

&

$

%

NetworkNetworkis a graph with additional data about vertices and/or links

N = (V,L, FV , FL)

Propertiesof verticesFV and linksFL can be measured in different scales:

numerical, ordinal and nominal. They can beinput as data orcomputed

from the network.

In Pajek numerical properties of vertices are represented byvectors,

nominal properties bypartitions or as labels of vertices. Numerical

property can be displayed assizeof vertex (figure), as itscoordinate; and a

nominal property ascolor or shapeof the figure, or as a vertexlabel.

We can assign inPajek numerical values to links. They can be displayed

asvalue, thicknessor grey level. Nominal vales can be assigned as label,

color or line pattern (seePajek manual, section4.3).

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 23: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 19'

&

$

%

Display of properties – school (Moody)

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 24: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 20'

&

$

%

Size of networkThe size of a network/graph is expressed by two numbers: number of

verticesn = |V | and number of linksm = |L|.

In a simple undirectedgraph (no parallel edges, no loops)m ≤ 12n(n− 1);

and in asimple directedgraph (no parallel arcs)m ≤ n2.

The quotientγ = mmmax

is adensityof graph.

Smallnetworks (some tens vertices) – can be represented by a picture and

analyzed by many algorithms (UCINET , NetMiner).

Also middle sizenetworks (some hundreds vertices) can still be represented

by a picture (!?), but some analytical procedures can’t be used.

Large networks (several thousands vertices) are too big to be displayed;

special algorithms are needed for their analysis (Pajek ).

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 25: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 21'

&

$

%

Temporal networks

In a temporal networkthe presence/activity of vertex/link can changethrough time. Pajek supports two types of descriptions of temporalnetworks based onpresenceand onevents.

Moody:

Drug users in Colorado Springs, 5 years

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 26: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 22'

&

$

%

Temporal networks – presence

*Vertices 31 "a" [5-10,12-14]2 "b" [1-3,7]3 "e" [4-*]*Edges1 2 1 [7]1 3 1 [6-8]

Vertexa is present in time intervals 5 to 10 and 12 to

14.

Edge(1 : 3) is present in time intervals 6 to 8.

A link is present, if both its end-vertices arepresent.

Time.net .

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 27: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 23'

&

$

%

Temporal networks – events

Event ExplanationTI t initial events – following events happen when

time pointt startsTE t end events – following events happen when

time pointt is finishedAVvns add vertexv with labeln and propertiessHVv hide vertexvSVv show vertexvDVv delete vertexvAAuvs add arc(u,v)with propertiessHAuv hide arc(u,v)SAuv show arc(u,v)DAuv delete arc(u,v)AEuvs add edge(u:v) with propertiessHEuv hide edge(u:v)SEuv show edge(u:v)DEuv delete edge(u:v)CVvs change property of vertexv to sCAuvs change property of arc(u,v) to sCEuvs change property of edge(u:v) to sCTuv change (un)directedness of line(u,v)CDuv change direction of arc(u,v)PEuvs replace pair of arcs(u,v)and(v,u)by single edge(u:v)

with propertiessAPuvs add pair of arcs(u,v)and(v,u)

with propertiessDPuv delete pair of arcs(u,v)and(v,u)EPuvs replace edge(u:v) by pair of arcs(u,v)and(v,u)

with propertiess

s can be empty.If case of parallel links:k denotes thek-th link – HE:3 14 37 hides the thirdedge connecting vertices14 and37 .

*Vertices 3*EventsTI 1AV 2 "b"TE 3HV 2TI 4AV 3 "e"TI 5AV 1 "a"TI 6AE 1 3 1TI 7SV 2AE 1 2 1TE 7DE 1 2DV 2TE 8DE 1 3TE 10HV 1TI 12SV 1TE 14DV 1

Time.tim . Friends.tim .

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 28: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 24'

&

$

%

Temporal networks / September 11

Steve Corman with collabora-

tors from Arizona State Uni-

versity transformed, using his

Centering Resonance Analysis

(CRA), daily Reuters news (66

days) about September 11th into

a temporal network of words

coappearance.

Pictures in SVG:66 days.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 29: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 25'

&

$

%

Two-mode networksIn a two-modenetworkN = (U, V, L, FV , FL) the set of vertices consists

of two disjoint sets of verticesU andV , and all the links fromL have one

end-vertex inU and the other inV . Often also aweightw : L → IR ∈ FL

is given; if not, we assumew(u, v) = 1 for all (u, v) ∈ L.

A two-mode network can also be described by a rectangular matrix

A = [auv]U×V .

auv =

wuv (u, v) ∈ L

0 otherwise

Examples: (persons, societies, years of membership), (buyers/consumers,

goods, quantity), (parlamentarians, problems, positive vote), (persons,

journals, reading).

Authors and works.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 30: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 26'

&

$

%

Deep South

Classical example of two-mode network

are Southern women (Davis 1941).

Davis.paj . Freeman’soverview.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 31: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 27'

&

$

%

What is missing?Current version ofPajek doesn’t support multiple networks (several

relations on the same set of vertices). They can be described by an

appropriate encoding (value, color, label) of relations, or as a temporal

network.

Pajek also doesn’t support partitions of links.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 32: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 28'

&

$

%

How to get a network?Collecting data about theN = (V,L, FV , FL) we have first to decide, what

are the units (vertices) –network boundaries, when are two units related

– network completness, and which properties of vertices/links we shall

consider.

These questions are especially crucial in measurements of social networks

(questionairs, interviews, observations, archive records, experiments, . . . ).

For large sets of units we can’t measure the complete network – we limit the

data collection to selected units and their neighbors. We get an ego-centric

network.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 33: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 29'

&

$

%

Use of existing network dataPajek supports input of network data in several formats: UCINET’s

DL files, graphs from project Vega, molecules in MDLMOL, MAC, BS;

genealogies in GEDCOM.

Davis.DAT , C84N24.VGR, MDL, 1CRN.BS, DNA.BS, ADF073.MAC,

Bouchard.GED .

Several network data sets are already available in computer readable form

and need only to be transformed into network descriptions.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 34: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 30'

&

$

%

Approaches to computer-assisted text analysisR. Popping:Computer-Assisted Text Analysis(2000) distinguishes three

main aproaches to CaTA:thematicTA, semanticTA, andnetworkTA.

Termsconsidered in TA are collected in adictionary (it can be fixed in

advance, or built dynamically). The main two problems with terms are

equivalence(different words representing the same term) andambiguity

(same word representing different terms). Because of these thecoding–

transformation of raw text data into formaldescription– is done mainly

manually or semiautomaticly. Asunitsof TA we usually consider clauses,

statements, paragraphs, news, messages, . . .

Till now the thematic and semantic TA mainly used statistical methods for

analysis of the coded data.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 35: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 31'

&

$

%

. . . approaches to CaTA

In thematic TA the units are coded as rectangular matrixText units× Conceptswhich can be considered as a two-mode network.

Examples: M.M. Miller:VBPro, H. Klein: Text Analysis/ TextQuest.

In semantic TA the units (often clauses) are encoded according to the S-V-O(Subject-Verb-Object) model or its improvements.

subject

verb

object

Examples:Roberto Franzosi; KEDS, Tabari (see also:Paul Hensel’s

International Relations Data Site, International Conflict and Cooperation

Data, Gulf.zip, Recode.zip)

This coding can be directly considered as network withSubjects∪ Objects

as vertices and links labeled withVerbs.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 36: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 32'

&

$

%

Network CaTA

TextAnalyst’s ’semantic network’

This way we already steped into

the network TA.

Examples:

Carley:Cognitive maps,

J.A. de Ridder:CETA,

Megaputer:TextAnalyst.

See also: W. Evans:Computer Environments for Content Analysis, K.A.

Neuendorf:The Content Analysis Guidebook/ Online and H.D. White:

Publications.

There are additional ways to obtain networks from textual data.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 37: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 33'

&

$

%

Dictionary networks

book description in ODLIS

In a dictionary graphthe terms deter-

mine the set of vertices, and there is

an arc(u, v) from termu to termv iff

the termv appears in the description

of termu.

Online Dictionary of Library and In-

formation ScienceODLIS, Odlis.net

(2909 / 18419).

Free On-line Dictionary of Comput-

ing FOLDOC, Foldoc2b.net(133356

/ 120238).

Artlex, Wordnet, OpenCyc.

The Edinburgh Associative Thesaurus (EAT) / net; NASA Thesaurus.

Paper.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 38: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 34'

&

$

%

Citation networks

In acitation graphthe vertices are differ-

ent publications from the selected area;

two publications are connected by an arc

if the first is cited by the second. The ci-

tation networks are almost acyclic.

E. Garfield:HistCite/ Pajek, papers.

An example of very large citation net-

work isUS Patents/ Nber,

n = 3774768, m = 16522438.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 39: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 35'

&

$

%

Collaboration networksUnits in a collaboration networkare usually indi-

viduals or institutions. Two units are related if they

produced a joint work. The weight is the number of

such works.

A famous example of collaboration network isThe

Erdos Number Project, Erdos.net.

A rich source of data for producing collaboration

networks are the BibTEX bibliographiesNelson H.

F. Beebe’s Bibliographies Page.

For example B. Jones:Computational geometry

database(2002),FTP, Geom.net.

An initial collaboration network from such data can

be produced using some programming. Then fol-

lows a tedious ’cleaning’ process.

An interesting datasetThe Internet Movie Database.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 40: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 36'

&

$

%

Krebs Internet industriesEach node in the network rep-

resents a company that com-

petes in the Internet industry,

1998 do 2001.

n = 219, m = 631.

red – content,

blue – infrastructure,

yellow – commerce.

Two companies are connected

with an edge if they have an-

nounced a joint venture, strate-

gic alliance or other partner-

ship.

URL: http://www.orgnet.com/netindustry.html . Recode,InfoRapid.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 41: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 37'

&

$

%

NeighborsSuppose that in the set of unitsV a dissimilarityd(u, v) is defined. Two

types of networks can be defined:

k-closest neighbors: N(k) = (V,A, d)

(u, v) ∈ A ⇔ v is amongk closest neighbors ofu

r-neighbors: N(r) = (V,E, d)

(u : v) ∈ E ⇔ d(u, v) ≤ r

These networks provide a link between data analysis and network analysis.

Efficient algorithms ?!

Fisher’sIris data.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 42: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 38'

&

$

%

TransformationsWords graph – words from a given set are vertices; two words are related

iff one can be obtained from the other by change (add, delete, replace) of a

single character.DIC28 , Paper.

Text network – vertices are (selected) words from a given text; two words

are related if they coappeared in the selected type of ’window’ (same

sentence,k consecutive words, . . . ) The weights count such coappearances.

ExampleCRA.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 43: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 39'

&

$

%

Networks from the Internet

KartOO network

Internet Mapping Project.

Links among WWW pages.

KartOO, TouchGraph.

E-mail and other services.

Server’s logs.

Cybergeography, CAIDA.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts

Page 44: Network Analysis of Texts / 1. Networks'vlado.fmf.uni-lj.si/pub/networks/doc/seminar/lisbon01.pdfV. Batagelj: Networks 1 Introduction We (Andrej Mrvar and me) started to develop Pajek

V. Batagelj: Networks 40'

&

$

%

WrappersWeb wrappersare special programs for collecting information from web

pages – often returned in XML format.

Several tools for automatic generation of wrappers: (paper/ list / LAPIS).

Free programs: XWRAP (description/ page) in TSIMMIS (description/

page).

Among commercial programs it seems the best islixto.

Additional URLs1, 2, 3.

ISEG, Lisbon, Portugal: Introductory Workshop to Network Analysis of Texts