Moscow, 2016
Big data in a cyberspace: recognition and simulation of critical phenomena in social
networks
Butakov N., Bochenina K., Boukhanovsky A.
Social network is a set of interconnected actors that generate events (e.g. interact) and thus may form different processes such information or infection spreading in the population.
Critical phenomena is essentially a sharp change in the behavior of a subset of individuals or the whole network that significantly affects population and change properties of the network or processes.
The Goal is to identify precursors of such phenomena, estimate possible situation developments and level of required interference
Social networks
To reach the goal data-driven approach is required
Adaptativity to the concrete situation by flowing data -> better solutions
Flexibility in decisions by comparing to precedential approach (a new solution may emerge)
Require to process fast huge amount of data
Data-driven approach
Macro-scale – modeling of
quantitative characteristics of
population
Meso-scale – modeling of
changes in state for individual
actors
Micro-scale – modeling of events
that happens due to actors
interactions
Three scales of social network:
Scales of a social network
The first step - data source: crawler
Distributed and multichannel data collecting
Multitenancy
Support of multiple networks and smooth automatic transition
between them: moving beyond borders of a particular network
Data source for big data processing systems
Appropriate capturing of network state may require not only observation of initial network but tracing processes and interactions between independent networks – moving from “site-to-site”
Online social networks (OSNs)
G = <V, E>ek = < vi, vj, rte>vi, vj ∈ Vrte ∈ RT
- post
- comment
- user (actor)
Example of entities and relations:owner
has owner
has
belongs to
friend
follower
Whole social network is a complex network that constantly evolve under interactions of its actors – e.g. events that creates post, comments and etc.
Individual actors – users and groups -are characterized by their behavior (or profile)
Behavior is characterized by a set of events related to an individual (e.g. public communications – posts, reposts, mentions, comments, likes) that can be represented as unevenly spaced (impulse-based) time series model
Use Case: deviant behavior in OSNs
Deviant behavior – behavior of an individual or a set of individuals that differ from “standard”
Standard is a particular instance ofsuch model and acceptable limits of difference
Behavior difference are combined of differences in particular events and thus to be estimated requires data-intensive processing of elementary events
Events: responses (with direct mentions) to the user’s messages
Example of serial trolling by the user with insults – frequency of such events is much greater compared to regular users
Dynamical processes on complex networks
The goal is to model SIR-alike processes on huge networks
For sparsed networks (E << N2) with size 225 (~33mil.)—230 (~1 bil.) nodes. It takes 12 Gb RAM for 227 , 50 Gb for 229 , 50–600 seconds. Time of sequential simulation for 100 iterations — 24 hour.
Developed effective balancing algorithm for nodes distribution among computational nodes
Experiments on supercomputers for networks up to 1 billion nodes size shows parallel effectivity about 0.9
Parallel simulation of dynamical processes on complex networks
Pareto fronts of information sources layouts found by genetic algorithm (GA) and greedy heuristic – selection of nodes with the biggest input degree (HD). The first one is 20% greater.
Optimization on micro-scale for macro characteristics:
identification of spreading processes, adjusting parameters of
individuals to generate required events (e.g. interaction).
Optimization on macro-scale for micro characteristics: build
subset of the population to provide individuals with required
features
Both optimization rely on simulations to investigate parameters
Use Case: information spreading optimization
The criminal organizations are considered as social networks that form collectives rather than organizations with unique features, such as flexible and non-hierarchical internal relations.
Use Case: disruption of criminal networks
Cannabis Cultivation Criminal network consists of multiple “Value Chains” – communities of individuals with appropriate roles to make “production”
The goal is to develop effective strategy to disrupt the network by breaking such “value chains”
Approach to infrastructure management
Combination of big
data and HPC in one
platform in the form of
composite application
with single interface
Scheduling and
management both
data-intensive and
cpu-intensive
workload
Different execution
layers: mesos,
supercomputers,
clouds, grids
The platform’s user interface
Scheduling in the common platform
MHGH scheduling algorithm based on time sharing principle
Conclusion
Three main components are required to handle social networks: data
source about network state, identification of the state from the data,
simulation of the network’s dynamic to predict its development
DSL-based multi network crawler is responsible for data collecting
and produces flow of data. These flow can be of huge size and
require data processing layer based on existing big data frameworks.
It is responsible for parameter identification on different scales to
capture actual network state. Predictions of network evolvement
requires simulation on different scales and combines using of
significant volumes of data with HPC capabilities
To effectively utilize these three components they have to be
combined in single instrumentation platform that is responsible for
workflow management, data delivery, scheduling of data-intensive
and cpu-intensive workload.
THANK YOU FOR YOUR ATTENTION!
Moscow, 2016.