big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 ·...

15
Moscow, 2016 Big data in a cyberspace: recognition and simulation of critical phenomena in social networks Butakov N., Bochenina K., Boukhanovsky A.

Upload: others

Post on 20-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

Moscow, 2016

Big data in a cyberspace: recognition and simulation of critical phenomena in social

networks

Butakov N., Bochenina K., Boukhanovsky A.

Page 2: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

Social network is a set of interconnected actors that generate events (e.g. interact) and thus may form different processes such information or infection spreading in the population.

Critical phenomena is essentially a sharp change in the behavior of a subset of individuals or the whole network that significantly affects population and change properties of the network or processes.

The Goal is to identify precursors of such phenomena, estimate possible situation developments and level of required interference

Social networks

Page 3: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

To reach the goal data-driven approach is required

Adaptativity to the concrete situation by flowing data -> better solutions

Flexibility in decisions by comparing to precedential approach (a new solution may emerge)

Require to process fast huge amount of data

Data-driven approach

Page 4: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

Macro-scale – modeling of

quantitative characteristics of

population

Meso-scale – modeling of

changes in state for individual

actors

Micro-scale – modeling of events

that happens due to actors

interactions

Three scales of social network:

Scales of a social network

Page 5: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

The first step - data source: crawler

Distributed and multichannel data collecting

Multitenancy

Support of multiple networks and smooth automatic transition

between them: moving beyond borders of a particular network

Data source for big data processing systems

Appropriate capturing of network state may require not only observation of initial network but tracing processes and interactions between independent networks – moving from “site-to-site”

Page 6: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

Online social networks (OSNs)

G = <V, E>ek = < vi, vj, rte>vi, vj ∈ Vrte ∈ RT

- post

- comment

- user (actor)

Example of entities and relations:owner

has owner

has

belongs to

friend

follower

Whole social network is a complex network that constantly evolve under interactions of its actors – e.g. events that creates post, comments and etc.

Individual actors – users and groups -are characterized by their behavior (or profile)

Behavior is characterized by a set of events related to an individual (e.g. public communications – posts, reposts, mentions, comments, likes) that can be represented as unevenly spaced (impulse-based) time series model

Page 7: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

Use Case: deviant behavior in OSNs

Deviant behavior – behavior of an individual or a set of individuals that differ from “standard”

Standard is a particular instance ofsuch model and acceptable limits of difference

Behavior difference are combined of differences in particular events and thus to be estimated requires data-intensive processing of elementary events

Events: responses (with direct mentions) to the user’s messages

Example of serial trolling by the user with insults – frequency of such events is much greater compared to regular users

Page 8: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

Dynamical processes on complex networks

The goal is to model SIR-alike processes on huge networks

For sparsed networks (E << N2) with size 225 (~33mil.)—230 (~1 bil.) nodes. It takes 12 Gb RAM for 227 , 50 Gb for 229 , 50–600 seconds. Time of sequential simulation for 100 iterations — 24 hour.

Developed effective balancing algorithm for nodes distribution among computational nodes

Experiments on supercomputers for networks up to 1 billion nodes size shows parallel effectivity about 0.9

Parallel simulation of dynamical processes on complex networks

Page 9: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

Pareto fronts of information sources layouts found by genetic algorithm (GA) and greedy heuristic – selection of nodes with the biggest input degree (HD). The first one is 20% greater.

Optimization on micro-scale for macro characteristics:

identification of spreading processes, adjusting parameters of

individuals to generate required events (e.g. interaction).

Optimization on macro-scale for micro characteristics: build

subset of the population to provide individuals with required

features

Both optimization rely on simulations to investigate parameters

Use Case: information spreading optimization

Page 10: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

The criminal organizations are considered as social networks that form collectives rather than organizations with unique features, such as flexible and non-hierarchical internal relations.

Use Case: disruption of criminal networks

Cannabis Cultivation Criminal network consists of multiple “Value Chains” – communities of individuals with appropriate roles to make “production”

The goal is to develop effective strategy to disrupt the network by breaking such “value chains”

Page 11: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

Approach to infrastructure management

Combination of big

data and HPC in one

platform in the form of

composite application

with single interface

Scheduling and

management both

data-intensive and

cpu-intensive

workload

Different execution

layers: mesos,

supercomputers,

clouds, grids

Page 12: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

The platform’s user interface

Page 13: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

Scheduling in the common platform

MHGH scheduling algorithm based on time sharing principle

Page 14: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

Conclusion

Three main components are required to handle social networks: data

source about network state, identification of the state from the data,

simulation of the network’s dynamic to predict its development

DSL-based multi network crawler is responsible for data collecting

and produces flow of data. These flow can be of huge size and

require data processing layer based on existing big data frameworks.

It is responsible for parameter identification on different scales to

capture actual network state. Predictions of network evolvement

requires simulation on different scales and combines using of

significant volumes of data with HPC capabilities

To effectively utilize these three components they have to be

combined in single instrumentation platform that is responsible for

workflow management, data delivery, scheduling of data-intensive

and cpu-intensive workload.

Page 15: Big data in a cyberspace: recognition and simulation of critical phenomena … · 2017-03-05 · Big data in a cyberspace: recognition and simulation of critical phenomena in social

THANK YOU FOR YOUR ATTENTION!

Moscow, 2016.