minfs544: business network data analytics and applications00000000-6686-7286... · minfs544:...
TRANSCRIPT
MINFS544: Business Network Data
Analytics and Applications
Feb 24th , 2016
Daning Hu, Ph.D.,
Department of Informatics
University of Zurich
F Schweitzer et al. Science 2009
Stop Contagious Failures in Banking Systems
During 2008 financial tsunami, which bank(s) we should inject capital first to stop contagious failures in bank networks? 2
Utilize Peer Influence in Online Social Networks
Intelligent Advertising, Product Recommendation
Who are the most influential people?
What are the patterns of product diffusion? 3
Develop Strategies to Attack Terrorist Networks
A Global Salafi Jihad Terrorist NetworkHu et al. JHSEM 2009
How to effectively break down a terrorist network?4
Network-based Business Intelligence
5
Network-based (Modeling and Analysis)
Modeling and analyzing various real-world social and organizational
networks to understand:
the cognitive and economic behaviors of the network actors; and
the dynamic processes behind the network evolution
Based on the above…
Business Intelligence (BI)
Design network-based BI algorithms and information systems to
provide decision support in various application domains
Financial Risk Management, Security Informatics, and Knowledge
Management, etc.
Network Analysis, Simulation of Network Evolution, Data Mining, etc.
Summary
• Lecturer: Dr. Daning Hu; Teaching Assistant: David Xiao Li
• Email: [email protected] [email protected]
• Credits: 3 ECTS credits
• Course web page:
http://www.ifi.uzh.ch/bi/teaching/Spring2016/Lecture1.html
• Language: English
• Audience: Master and doctoral students
• Office Hours: Tue 13:00–14:00 PM, Room 2.A.12, Please
send emails to make appointments.
• Grading: Course report (term paper) 80% and interactions
20%
7
Grading
• 1. A full research paper (80%). The format of this paper can
be found at:
• http://icis2016.aisnet.org/call-dates/submission-guidelines/
• * If possible, get it published in ICIS 2015 and get it cited.
• This paper should include answers to the following
questions:
– What is the research problem?
– Why is it interesting and important?
– Why is it hard? Why have previous approaches failed?
– What are the key components of your approach?
– What 1) models, 2) data sets and 3) metrics will be used to validate
the approach?
A Brief History of Network Science
8
Mathematical foundation – Graph Theory1736
1930 Social Network Analysis and Theories
Sociogram: Network visualization
Six degree of separation
Structural hole: Source of innovation
Network Science Economic networks (Agent modeling & simulation)
Dynamic network analysis
BI applications: product diffusion in social media, recommendation systems
1990 (Physicists) Complex Network Topologies
Small-world model (e.g., WWW)
Scale-free model (“Rich get richer”)2000
2012
?
Outline
9
Introduction
Dynamic Analysis of Dark Networks
A Global Salafi Jihad (GSJ) Terrorist Network
A Narcotic Criminal Network
A Network Approach to Managing Bank Systemic Risk
Ongoing Work
Conclusion
Dynamic Network Analysis (DNA)
10
What Why How
Model the changes in
network evolution
Temporal changes in
network topological
measures
Dynamic network
recovery on
longitudinal data
Studying dynamic link formation processes behind
network evolution.
Nodes forming links Network Evolution
Statistical analysis of
determinants behind
link formation
Homophily
Preferential
attachment
Shared affiliations
Simulate the
evolution of networks
Agent-based
Modeling and
Simulation
Examine network
robustness
Research Testbed: A Global Terrorist Network
11
The Global Salafi Jihad (GSJ) network data is compiled by a
former CIA operation officer Dr. Marc Sageman - 366 terrorists
friendship, kinship, same religious leader, operational interactions, etc.
geographical origins, socio-economic status, education, etc.
when they join and leave GSJ
The goal of dynamic analysis
gain insights about the evolution of GSJ network
develop effective attack strategies to break down GSJ network
Sample data of GSJ terrorists
12
a
13
Dynamic Network Analysis
14
What Why How
Model the changes
in network evolution
Temporal changes in
network topological
measures
Dynamic network
recovery on
longitudinal data
Studying dynamic processes (i.e., link formation) behind
network evolution.
Nodes’ behaviors Network Evolution
Statistical analysis of
determinants behind
link formation
Homophily
Preferential
attachment
Shared affiliations
Simulate the
evolution of networks
Agent-based
Modeling and
Simulation
Examine network
robustness
Temporal Changes in Network-level Measures
Average Degree <k >
0
2
4
6
8
10
12
14
16
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
deg
ree
<k>
Fig.1. The temporal changes in the (a)
average degree, (b) and (c) degree
distribution
Degree = number of links a node has
a
b
c
0.00
0.03
0.06
0.09
0.12
0.15
0.18
0.21
0.24
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
pro
bab
ilit
y o
f d
eg
ree
1990
1991
1993
Poisson
0.00
0.03
0.06
0.09
0.12
0.15
0.18
0.21
0.24
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52
pro
bab
ilit
y o
f d
eg
ree
1995
1997
1999
Findings
16
There are three stages for the evolution of the GSJ network:
1989 - 1993 The emerging stage:
The network grows in size
Accelerated Growth - No. of edges increases faster than nodes
Random network topology (Poisson degree distribution)
1994 - 2000 The mature stage:
The size of the network reached its peak in 2000
Scale-free topology (Power-law degree distribution)
2001 - 2003 The disintegration stage:
Falling into small disconnected components after 9/11
Temporal Changes in Node Centrality Measures
17
0
10
20
30
40
50
60
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
Degree
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
Betweenness
Figure.2. Temporal changes in
Degree and Betweenness centrality
of Osama Bin Laden
Degree: No. of links a node has
Betweenness of a node i
No. of shortest paths from all nodes to
all others that pass through node i
Measure i’s influence on the traffic
(information, resource) flowing through it
Findings and Possible Explanations
18
1994 – 1996: A sharp decrease in Bin Laden’s Betweenness
1994: Saudi revoked his citizenship and expelled him
1995: Went to Sudan and was expelled again under U.S. pressure
1996: Went to Afghanistan and established camps there
1998 –1999: Another sharp decrease in his Betweenness
After 1998 bombings of U.S. embassies, Bill Clinton ordered a freeze
on assets linked to bin Laden (top 10 most wanted)
August 1998: A failed assassination on him from U.S.
1999: UN imposed sanctions against Afghanistan to force the Taliban
to extradite him
Research Testbed: A Narcotic Criminal Network
The COPLINK dataset contains 3 million police incident reports from the Tucson Police Department (1990 to 2006).
3 million incident reports and 1.44 million individuals
Their personal and sociological information (age, ethnicity, etc.)
Time information: when two individuals co-offend
AZ Inmate affiliation data: when and where an inmate was housed
A Narcotic Criminal Network
19,608 individuals involved in organized narcotic crimes
29,704 co-offending pairs (links)
19
COPLINK
Narcotic Data
Arizona Inmate
Data
Overlapped (identified by first
name, last name and DOB)
Number of People 36,548 165,540 19,608
Time Span 1990 - 2006 1985 - 2006 17 years
Table 1. Summary of the COPLINK dataset and the Arizona inmate dataset
Proportional hazards model (Cox Regression Analysis)
Homophily in age (group) and race
Shared affiliations:
Mutual acquaintances (through crimes)
Vehicle affiliation (same vehicle used by two in different crimes)
20
h(t, x1, x2, x3...) = h0(t)exp(b1x1 +b2x2 +b3x3...)
Statistical Analysis of Determinants for Link Formation
Fig.3. Results of
multivariate survival
(Cox regression)
analysis of triadic
closure (link formation).
IBM’s COPLINK is an intelligent police information system aims to to help speed up the crime detection process.
COPLINK calculates the co-offending likelihood score based on the proportional hazards model .
A ranked list of individuals based on their predicted likelihood of
co-offending with the suspect under investigation.
21
BI Application: Co-offending Prediction in COPLINK
Fig.4. Screenshots
of the COPLINK
system
Simulate Attacks on Dark Networks
22
Three attack (i.e. node removals) strategies:
Attack on hubs (highest degrees)
Attack on bridge (highest betweenness)
Real-world Attack (Attack order based on real-world data)
Simulate two types of attacks to examine the robustness
of the Dark networks
Simultaneous attacks (the degree/betweenness of nodes are NOT
updated after each removal) – Static
Progressive attacks (the degree/betweenness of nodes are
updated after each removal) – Dynamic
Hub Vs. Bridge Attacks
23
Both hub and bridge attacks are far more effective than real-
world arrests – Policy implications?
Both Dark networks are more vulnerable to Bridge attacks
than Hub attacks.
Bridge (highest beweenness): Field lieutenants, operational leaders, etc.
Hub (highest degree) : e.g., Bin Laden
GSJ
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
Fraction of nodes removed
S a
nd <
s>
S (Hub attacks)
S (Bridge attacks)
24
Summary and Contributions
We developed a set of Dynamic Network Analysis (DNA)
methods that are effective in
Linking network topological changes to analytical insights
Systematically capturing the link formation processes
Examining the determinants of link formation
Dark networks are
robust against real-world attacks
but vulnerable to targeted bridge attacks
COPLINK provides real-time decision support for fighting crimes.
25
Research Readings and Resources
• 1. Networks Overview:
• * Statistical mechanics of complex networks, Section III, VI
– http://rmp.aps.org/abstract/RMP/v74/i1/p47_1
• * Networks, Crowds, and Markets:
– http://www.cs.cornell.edu/home/kleinber/networks-book/
• 2. Networks in Finance:
• * Financial Networks blog and research databases:
– WRDS database
– http://www.financialnetworkanalysis.com/research-database/
– http://www.stern.nyu.edu/networks/electron.html
– * Company Board Social Networks
26
Research Readings and Resources (cont.)
• 3. Networks in Marketing:
– * Sinan Aral’s research in networks and marketing
– Peer influence
– http://web.mit.edu/sinana/www/
• * Social Media based Marketing:
– http://searchengineland.com/guide/what-is-social-media-marketing
• 4. Recommender Systems:
– http://www-cs-students.stanford.edu/~adityagp/recom.html
• 5. Word-of-Mouth Effects in Social Networks:
– http://papers.ssrn.com/sol3/papers.cfm?abstract_id=393042&