flow processes and the structural importance of nodes

Post on 08-Feb-2016

32 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Flow Processes and the Structural Importance of Nodes. Mohamed Atta. Steve Borgatti Boston College. Data courtesy of Valdis Krebs. Attacking Terrorist Nets. Find and eliminate structurally important nodes and lines bridges, cut-points; minimum weight cutsets measures of centrality - PowerPoint PPT Presentation

TRANSCRIPT

Flow Processes and the Structural Importance

of NodesMohamed Atta

Data courtesy of Valdis KrebsSteve Borgatti

Boston College

Attacking Terrorist Nets• Find and eliminate structurally important

nodes and lines– bridges, cut-points; minimum weight

cutsets– measures of centrality

• closeness, betweenness, eigenvector, etc.

Terrorist Network

Terrorist Network

Data courtesy of Valdis Krebs

Mohamed Atta

Djamal Beghal Essid Sami Ben Khemais

Mamoun Darkazanli

Nawaf Alhazmi Raed Hijazi

Usman Bandukra

Many Problems• Data not good enough

– Mostly known after an event– Sensitive to error

• Benefits are short-term at best– Must address recruitment, training– it is precisely those organizations that

make heavy use of suicide bombers that are organized as networks

Was al Qaeda incapacitated by removal

of 19 hijackers?

DeadAlive?

Data courtesy of Valdis Krebs

One Additional Problem• Centrality measures make certain

assumptions about how things flow– and may produce poor estimates when

misapplied– need to work that out before deciding

which node to remove

Objective• Enumerate kinds of flow processes• Analyze properties• Relate to structural importance of nodes• Relate to existing measures of centrality

Types of Flow Processes• Gift process• Currency process• Transport process• Postal process

• Gossip process• E-mail process• Infection process• Influence process

(several others)

Gift Process• Canonical example:

– passing along used paperback novel • Single object in only one place at a time• Doesn’t travel between same pair twice• Could be received by the same person

twice• A--B--C--B--D--E--B--F--C ...

Currency Process• Canonical example:

– specific dollar bill moving through the economy

• Single object in only one place at a time• Can travel between same pair more

than once• A--B--C--B--C--D--E--B--C--B--C ...

Gossip Process• Example:

– juicy story moving through informal network

• Multiple copies exist simultaneously• Person tells only one person at a time*• Doesn’t travel between same pair twice• Can reach same person multiple times

* More generally, they tell a very limited number at a time.

E-Mail Process• Example:

– forwarded jokes and virus warnings– e-mail viruses themselves

• Multiple copies exist simultaneously• All (or many) connected nodes told

simultaneously (except the immediate source?)

Influence Process• Example:

– attitude formation• Multiple “copies” exist simultaneously• Multiple simultaneous transmission,

even between the same pairs of nodes

Infection Process• Example:

– virus which activates effective immunological response

• Multiple copies may exist simultaneously• Cannot revisit a node

• A--B--C--E--D--F...

Postal Process• Example:

– package delivered by postal service• Single object at only one place at one

time• Map of network enables the intelligent

object to select only the shortest paths to all destinations

Uncovering Flow Properties

• Take componential analysis approach– identify a set of flow processes– compare and contrast to discover

minumum set of attributes (properties) that distinguish them from each other

– view each distinct flow process as unique bundle of properties -- typology

Properties of Flow Processes

• Sequence type: path, trail, walk– path: can’t revisit node nor edge (tie)– trail: can revisit node but not edges– walk: can revisit edges & nodes

• Deterministic vs non-deterministic– blind vs guided– always chooses best route; aware of map

• Combine into 4-way “pattern” property: – geodesics, paths, trails, walks

Properties -- cont.• Duplication vs transfer (copy vs move)

– transfer/move: only one place at one time– duplication/copy: multiple copies exist

• Serial vs parallel duplication– serial: only one transmission at a time– parallel: broadcast to all surrounding nodes

• Combine into “method” 3-way property:– parallel dup., serial dup., transfer

Simplified Typology

parallel duplication

serial duplication transfer

geodesics postalpaths nameserver virus moochertrails e-mail gossip giftwalks influence currency

goods

information

So What?• The properties of a flow process

(together w/ node position) determine which nodes are structurally important– a node that is important in one process is

not important in another– off-the-shelf centrality measures implicitly

assume certain flow properties and are only interpretable for certain flow processes (ala Friedkin)

Closeness Centrality• A node’s centrality is sum of

geodesic distances to all others.– Length of shortest paths

• Is index of expected time until arrival of that-which-flows for consistent processes:– non-deterministic (e.g., postal)– parallel duplication (e.g., e-mail,

nameserver)

S

L

QP X

T M

Closeness Centrality

parallel duplication

serial duplication transfer

geodesics Freeman Freeman Freemanpaths Freeman NEW NOtrails Freeman NEW NOwalks Freeman ? Markov

How long does a token take to reach a node?

Calculating

Betweenness Centrality• Count no. of geodesic paths from

each node to every other node that pass through X– if there is more than one geo-desic

from S to T, count the prop-ortion that pass through X

• Interpret as – how often node utilized by others– potential for control & synthesis S

L

QP X

T M

Betweenness Flow Processes

• Consistent processes– postal process

• Nearly consistent– parallel processes (all routes at same time)– but ... needn’t choose between geodesics

• Implication– better for modeling transportation of goods

than information

Betweenness Centrality

parallel duplication

serial duplication transfer

geodesics NEW NEW Freemanpaths NEW NEW NEWtrails NEW NEW NEWwalks Friedkin? ? Friedkin?

How often does a token pass through a node?

Calculating

Eigenvector Centrality• Eigenvector of adjacency matrix

– in effect, counts number of walks of all lengths emanating from node, weighted inversely by length

• Interpreted as popularity or being in the thick of things

• Assumes flow can return to same nodes & lines

A ΣkAk

Row sumsof kAk

=+ 3A3 ++ 1A1 + ...

2A2k

“Cross-Platform” Centrality

• How far off are these centrality measures when used with wrong flow process?

• How can we correctly measure closeness and betweenness concepts in different flow contexts?

• Simulation modeling

Realized Centrality• Essence of closeness is the expected

time until arrival of fluenda– realized closeness is an empirical

measurement of the avg time until arrival– Freeman closeness is an estimator of this

• model-based formula that should correspond to actual longterm values if the model fits

• Betweenness is expected number of times a fluendum passes through node

Simulation Procedure• For each of 10,000 trials* ...

– For each node,• let token originate at the node & propagate

according to flow process rules until it can go no further

• record which nodes are visited along way and # of units of time needed to arrive at each node for first time

– Cumulate realized closeness and realized betweenness

(for deterministic flow processes)

*NOTE: Parallel processes only require 1 trial -- no randomness

Simulation Procedure• For each of 10,000 trials ...

– For each ordered pair of (source,target) nodes

• let token originate at source node & propagate according to flow process rules until it either reaches target node or can go no further

• record which nodes are visited and # of units of time needed to arrive at each node for first time

– Cumulate realized closeness and realized betweenness

(for non-deterministic processes)

Alternative Methods• Can use non-deterministic procedure on all

processes, for comparability to Freeman betweenness– numerical results quite different– but larger conclusions are the same

• But, logically, not sensible– Freeman’s dyadic method presupposes source

& target• i.e., non-deterministic process

Empirical Results• Compare realized closeness &

betweenness with Freeman measures across different flow processes

• Dataset is known ties among terrorists compiled by Valdis Krebs

• Start with betweenness

Betweenness in Postal Proc.

Name FreeBet RealBetMohamed Atta 1106.9 1108.4Essid Sami Ben Khemais 470.5 468.6Zacarias Moussaoui 434.5 423.0Nawaf Alhazmi 287.6 294.0Hani Hanjour 233.8 229.2Djamal Beghal 195.7 192.9Marwan Al-Shehhi 167.0 162.3Satam Suqami 137.4 136.7Ramzi Bin al-Shibh 88.2 86.2Abu Qatada 78.9 86.4Raed Hijazi 62.6 62.4Tarek Maaroufi 61.5 62.5Mamoun Darkazanli 61.0 61.0Imad Eddin Barakat Yarkas 54.7 65.0Fayez Ahmed 47.9 47.0Abdul Aziz Al-Omari* 42.6 42.8Hamza Alghamdi 40.9 44.9Saeed Alghamdi* 32.2 32.3Ziad Jarrah 31.1 29.0Ahmed Al Haznawi 28.3 28.2

Salem Alhazmi* 23.3 23.9Lotfi Raissi 21.7 21.2Agus Budiman 21.1 20.9Ahmed Alghamdi 13.7 14.4Ahmed Ressam 13.1 13.4Haydar Abu Doha 12.5 12.9Kamel Daoudi 11.8 8.4Khalid Al-Mihdhar 10.3 9.4Nabil al-Marabh 6.7 6.8Mohamed Bensakhria 6.5 7.8Wail Alshehri 4.5 5.0Mustafa Ahmed al-Hisawi 4.5 5.0Said Bahaji 3.6 3.8Jerome Courtaillier 2.9 3.0Waleed Alshehri 1.6 1.5Abu Walid 1.6 2.0Rayed Mohammed Abdullah 1.5 1.5Mehdi Khammoun 1.0 1.0Mohand Alshehri* 1.0 1.1Nabil Almarabh 0.0 0.0Abdussattar Shaikh 0.0 0.0

(all the rest are zeros on both measures)

Betweenness / Gossip Process

ID Real* Free* rReal rFree6 3.843 6.384 1 1

11 3.370 0.649 2 71 2.088 1.056 3 5

16 1.409 -0.181 4 1929 1.376 0.168 5 9

3 1.348 1.384 6 410 1.048 -0.111 7 1612 0.988 -0.078 8 15

4 0.951 -0.228 9 219 0.921 0.468 10 8

37 0.908 2.500 11 221 0.850 2.281 12 325 0.501 -0.349 13 3314 0.493 -0.121 14 17

8 0.479 -0.343 15 317 0.478 -0.361 16 35

41 0.399 0.005 17 125 0.377 -0.308 18 28

19 0.336 -0.174 19 1846 0.237 0.824 20 653 -0.031 -0.242 21 2324 -0.031 -0.238 22 2236 -0.032 -0.371 23 4313 -0.035 -0.287 24 2447 -0.041 0.111 25 10

• Sequential duplication across trails: rumors

• Scores standardized to =0, =1

scores ranks

Betweenness / Gossip

-2.000

-1.000

0.000

1.000

2.000

3.000

4.000

5.000

6.000

7.000

-2.000 0.000 2.000 4.000 6.000

Freeman Betweenness

Rea

lized

Bet

wee

nnes

s

Betweenness in Gossip Proc.

Over-estimated by betweenness centrality

Under-estimated by betweenness centrality

Over-estimated by betweenness centrality

Data courtesy of Valdis Krebs

Token rarely gets to 46, so its realized betweenness cannot be as high as the Freeman measure estimates

Freeman measure is zero when contacts are connected

Blind vs Guided Flows• Nodes embedded in dense regions are

more important in blind processes than in nondeterministic processes.– It is in blind processes that we see bottling-

up phenom. that Granovetter alludes to

Path redundancy

Individual performance

Type of flow

Betweenness in Gift Process

ID Real* Free* rReal rFree6 3.985 6.384 1 1

11 3.356 0.649 2 71 2.188 1.056 3 53 1.470 1.384 4 4

16 1.395 -0.181 5 1929 1.328 0.168 6 910 1.155 -0.111 7 16

4 0.910 -0.228 8 2112 0.906 -0.078 9 15

9 0.858 0.468 10 837 0.851 2.500 11 214 0.656 -0.121 12 1725 0.585 -0.349 13 3321 0.553 2.281 14 3

7 0.401 -0.361 15 358 0.398 -0.343 16 315 0.363 -0.308 17 28

19 0.350 -0.174 18 1846 0.181 0.824 19 613 0.166 -0.287 20 2424 0.166 -0.238 21 2253 0.132 -0.242 22 2336 0.120 -0.371 23 4341 0.110 0.005 24 12

• Physical transfer along trails: used paperback

• Scores standardized to =0, =1

Betweenness / Gift

y = 0.6902x + 0.0048R2 = 0.4764

-2.000

-1.000

0.000

1.000

2.000

3.000

4.000

5.000

6.000

7.000

-2.000 -1.000 0.000 1.000 2.000 3.000 4.000 5.000 6.000 7.000

Realized

Free

man

Cen

tral

ity

Closeness in Gossip Process

ID Real* Free* rReal rFree6 -2.493 -1.691 1 1

11 -1.612 -1.637 2 229 -1.286 -1.582 3 516 -1.156 -1.528 4 9

1 -1.384 -1.473 5 310 -1.189 -1.418 6 825 -0.863 -1.364 7 16

3 -1.384 -1.309 8 421 -1.286 -1.255 9 6

9 -1.058 -1.200 10 104 -0.406 -1.146 11 24

62 -0.993 -1.091 12 1112 -0.993 -1.037 13 1236 -0.765 -0.982 14 1724 -0.928 -0.927 15 1353 -0.765 -0.873 16 18

8 -0.928 -0.818 17 1455 -0.732 -0.764 18 1918 -0.895 -0.709 19 1522 -0.732 -0.655 20 2014 -0.374 -0.600 21 2537 -1.254 -0.546 22 727 -0.732 -0.491 23 21

7 0.050 -0.164 29 2913 -0.015 -0.327 26 26

• Sequential duplication across trails: rumors

• Scores* standardized to =0, =1

• Correlation is high -- much better than betweenness corr

scores ranks

Closeness / Gossip

-3.000

-2.000

-1.000

0.000

1.000

2.000

3.000

-3.000 -2.000 -1.000 0.000 1.000 2.000 3.000

Freeman Closeness

Rea

lized

Clo

sene

ss

Closeness in Gossip Process

Over-estimated by closeness centrality

Under-estimated by closeness centrality

Colors based on average arrival times Data courtesy of Valdis Krebs

In gossip process, token gets bottled up by dense regions, takes long time to escape to other groups. Hard for blind process to find way out.

Closeness in Currency Process

-3.00

-2.00

-1.00

0.00

1.00

2.00

3.00

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

Freeman Closeness

Lack of Symmetry• In many processes, avg distance to

node does not equal distance from the node– even though network is symmetrical

• People who can reach others in few steps are NOT the same as people who can be reached by others in few steps– Freeman closeness uncorrelated w/ former

Asymmetry Due to Degree Variance

1 2 3 4 5 6 7 8 9 101 5.7 5.2 4.6 5.6 4.6 13.2 11.3 10.6 5.32 2.4 2.3 2.4 2.4 6.3 14.4 12.7 11.8 7.13 4.2 4.8 4.4 4.2 4.9 14.5 12.6 11.4 8.14 3.3 4.0 3.7 3.9 5.7 12.7 11.2 10.3 3.95 3.0 3.2 2.6 3.1 6.6 15.0 13.3 12.3 7.86 6.9 20.8 8.5 11.1 20.6 13.0 7.8 7.5 7.17 7.6 20.0 11.0 8.9 19.8 4.1 3.1 3.1 3.08 7.4 19.8 10.7 8.9 19.6 3.0 3.1 3.1 3.09 7.7 19.4 10.1 9.2 19.1 3.6 3.8 3.8 3.6

10 4.2 16.7 9.4 4.3 16.5 3.7 4.1 4.2 4.0

“Distance” Matrix

From

To

Lack of Computability• Closeness in Gift Process

– Gift gets stuck in cul-de-sac, resulting in infinite time/distance

– Can’t compute expected time til arrival

parallel duplication

serial duplication transfer

geodesics Freeman Freeman Freemanpaths Freeman NEW NOtrails Freeman NEW NOwalks Freeman ? Markov

Correlations Among Centralities

EgoDensityEigenvec FreeClos FreeBet BetGossip BetEmail BetGift BetInfect BetMoney ReaGift CloGossip CloInfect CloMoneyEgoDensity 1.000 -0.180 0.362 -0.437 -0.437 -0.142 -0.425 -0.660 -0.472 -0.449 0.268 0.411 0.346Eigenvec -0.180 1.000 -0.797 0.503 0.880 0.376 0.901 0.501 0.853 0.881 -0.763 -0.861 -0.777FreeClos 0.362 -0.797 1.000 -0.532 -0.780 -0.480 -0.780 -0.606 -0.761 -0.825 0.900 0.881 0.816FreeBet -0.437 0.503 -0.532 1.000 0.683 0.059 0.685 0.348 0.757 0.610 -0.356 -0.501 -0.376BetGossip -0.437 0.880 -0.780 0.683 1.000 0.489 0.994 0.710 0.985 0.961 -0.707 -0.878 -0.825BetEmail -0.142 0.376 -0.480 0.059 0.489 1.000 0.477 0.728 0.423 0.599 -0.608 -0.631 -0.753BetGift -0.425 0.901 -0.780 0.685 0.994 0.477 1.000 0.689 0.987 0.968 -0.717 -0.876 -0.824BetInfect -0.660 0.501 -0.606 0.348 0.710 0.728 0.689 1.000 0.684 0.785 -0.632 -0.751 -0.840BetMoney -0.472 0.853 -0.761 0.757 0.985 0.423 0.987 0.684 1.000 0.948 -0.662 -0.839 -0.775ReaGift -0.449 0.881 -0.825 0.610 0.961 0.599 0.968 0.785 0.948 1.000 -0.814 -0.938 -0.923CloGossip 0.268 -0.763 0.900 -0.356 -0.707 -0.608 -0.717 -0.632 -0.662 -0.814 1.000 0.901 0.893CloInfect 0.411 -0.861 0.881 -0.501 -0.878 -0.631 -0.876 -0.751 -0.839 -0.938 0.901 1.000 0.956CloMoney 0.346 -0.777 0.816 -0.376 -0.825 -0.753 -0.824 -0.840 -0.775 -0.923 0.893 0.956 1.000

MDS of Correlations Among Centrality Scores

-2.26

-1.81

-1.37

-0.92

-0.47

-0.03

0.42

0.86

1.31

1.75

2.20

-2.26 -1.37 -0.47 0.42 1.31 2.20

FreeClos

Eigenvec

FreeBet

BetGossipBetEmail

BetGiftReaGift

CloGossip

CloInfect

BetInfect

CloMoney

BetMoney

Summary• Variety of flow processes

– Distinguished by a system of properties• Key properties include

– blind / guided– copy / move– serial / parallel– path / trail / walk

Summary -- cont.• Properties combine to form set of rules

that determine how things flow• These rules interact with structural

location to determine– who gets things earliest– who gets a lot of traffic– i.e., structural importance

Summary -- cont.• Centrality measures make assumptions

about the kinds of flow processes• Freeman measures only consistent with a

few flow processes– When applied to other flow processes they

get the “wrong” answer -- i.e., are not interpretable in the obvious way

– For other processes, need new measures (or use simulation) -- where computable

Assumptions• We can separate concepts from

measures– essence of closeness is time until arrival

• Betweenness make sense in deterministic context

Tentative Conclusions• Applied to deterministic (blind) flows,

Freeman measures over-estimate importance of peripherals and under-estimate importance of core nodes– flows get bottled-up in dense areas with

many redundant pathsPath redundancy

Individual performance

Type of flow

To Do List• Anova-like comparison of results across

different boxes in typology– is the big difference deterministic vs non? Walk

vs path/trail? Copy vs move?• How does the shape of the network

interact with the rules of flow to produce different structural importances for the nodes? -- core/periphery structures– Take loosely game-theoretic approach

To Do List -- cont.• Construct analytic measures

appropriate for each flow process• Extend to directed graphs and

probabilistic ties

Stop Talking.

Group Centrality

• These 3 nodes directly reach 42 people (2/3 of whole)

Data courtesy of Valdis Krebs

Group Centrality

• These 5 nodes directly reach 54 people (86% of whole)

Data courtesy of Valdis Krebs

Encourage Divisions

• Look for and encourage differences in the regions

Data courtesy of Valdis Krebs

Minimum Weight Cutsets

• Cutting just 2 nodes (or 5 ties) splits the network into two pieces

Data courtesy of Valdis Krebs

Mohamed AttaRamzi Bin al-Shibh

Note: data include persons now dead.

Objectives• Uncover assumptions behind measures

of centrality– assumptions about how things flow in nets

• Create a “theory” of how properties of flows affect structural importance of nodes– enable appropriate measurement

Proposition• Classical measures of centrality

presuppose certain properties of flows– may be inappropriate for flows with

different properties• More theoretically: Flow properties

determine the structural importance of nodes

Test Procedures• In non-deterministic and in parallel

duplication processes, realized closeness should match Freeman closeness calculation

• In postal processes, realized between-ness should match Freeman between-ness calculation

SNA Tactics “r” Short Term

• SNA helps decide which nodes & ties to prune, but ...– There are ties we don’t know about – New ties develop– Nodes are replaced– 11 Sept experiment

• Best used at cusp moments to break up a specific attack

Medium Term Tactics• Targeted harassment to shape network

(e.g., increase redundancy of paths)– reduces efficiency of communication?– information distortion can create confusion

• Adding fake nodes– spreading slightly changed information

• create trust problem– place fake nodes in key network positions

Medium Term• Push network into maladaptive shape

– factions– brittle network with many cutpoints– dense networks with redundant pathways

Encourage Divisions

• Look for and encourage differences in the regions

Data courtesy of Valdis Krebs

Long-Term Tactics• Dry up the money• Eliminate recruitment

– make other career paths more attractive– competing glamorous groups, some fake– whispering campaign to discredit members– minimize visible confrontation

Long-Term Tactics• Eliminate al Qaeda training system

– when communication is difficult, coordination is achieved via common training

– creates bond that is activated later to execute an operation

Terrorist Networks?• Long-term suggestions largely

organizational in character --- – is there really an advantage to

conceptualizing terrorist groups as networks?

• Is al Qaeda – any more of a network than corporations?– and any less of a formal organization?

Characteristics of Formal Organizations

– documented procedures / company manual

– occupational training / company orientation

– functional division of labor & specialization

– unity of command– career management– coordination baed on

authority & standardized work processes (professional training)

– targeted communication (unlike meandering gossip)

al Qaeda as Formal Org– detailed manuals

• technical info, management info, logistical -- what kind of house to rent

– lengthy & rigorous training courses– function division of cells– movement of personnel

al Qaeda as Formal Org– centralized decision-making

• coordination of efforts across countries– attacks on US embassies in Tanzania & Kenya

within 9 minutes of each other

• “Those who were trained to fly didn’t know the others. One group of people did not know the other group.” - Usama bin Laden

– communication non-deterministic?

Conclusions• Conventional network approach of using

centrality measures to identify targets needs to be modified– calculate appropriate measures of structural

importance• Even so, eliminating nodes provides only

short term relief• Some network-informed medium term

tactics

Conclusions -- cont.• Long term, we must

– address the finance, recruitment & training issues

• These are not fundamentally network issues– is al Qaeda really more like a social

network than a formal organization?– Going along with the media hype

Conclusions -- cont.• Org’l theory might be more fertile ground

for developing defensive tactics– resource dependency.

• Don’t cutoff funds, control them.– contingency theory. Complicate environs

• e.g., face other terrorist groups: Ath. Lib. Fr.– network governance.

• Make “friends” with al Qaeda’s allies– institutional theory. Fight legitimacy.

top related