different approaches to community evolution prediction in blogosphere

61
DIFFERENT APPROACHES TO COMMUNITY EVOLUTION PREDICTION IN BLOGOSPHERE Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław Kazienko, Jarosław Kolak

Upload: austin-mcintyre

Post on 30-Dec-2015

24 views

Category:

Documents


0 download

DESCRIPTION

Different Approaches to Community Evolution Prediction in Blogosphere. Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław Kazienko, Jarosław Kolak. Outline:. Introduction and motivation Methods of events identification in group evolution: SCGI GED - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Different Approaches to Community Evolution Prediction in Blogosphere

DIFFERENT APPROACHES TO COMMUNITY EVOLUTIONPREDICTION IN BLOGOSPHERE

Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław Kazienko, Jarosław Kolak

Page 2: Different Approaches to Community Evolution Prediction in Blogosphere

OUTLINE:

Introduction and motivation Methods of events identification in group

evolution: SCGI GED

Predicting group evolution in the social network

Dataset and experiment setup Classifiers – reminder For each method we will compare results

between different classifiers conclusion

2

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 3: Different Approaches to Community Evolution Prediction in Blogosphere

GENERAL IDEA

Predicting the future direction of community evolution allows to determine which characteristics describing communities have importance from the point of view of their future behavior.

3

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 4: Different Approaches to Community Evolution Prediction in Blogosphere

MOTIVATION

Making decision concerning investing in contact with members of a given community and carrying out actions to achieve a key position in it

Allows to determine effective ways of forming opinions.

Allows to protect group participants against such activities.

4

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 5: Different Approaches to Community Evolution Prediction in Blogosphere

INTRODUCTION – PREDICTION

Link prediction (Best investigated) link prediction problem: predicting the existence of a link (relation) between two nodes (users) within a social network. Liben-Nowell - focused on path and common

neighbours between pair of nodes Lichtenwalter consider degrees and mutual

information between them.

5

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 6: Different Approaches to Community Evolution Prediction in Blogosphere

INTRODUCTION – PREDICTION Link sign prediction - Sign in this context

means that predicted relation between users may be positive or negative Symeonidis looked at paths between the node pair

and use the notion of similarity to predict the sign Leskovec use degree and mutual information between

pair of nodes for link prediction and profits from the theory of balance and status to predict the link sign.

Richter and Wai-Ho faced the very important task of churn prediction (the number of individuals moving out of a collective over a specific period of time).

Richter presented a new approach and tried to predict churn based on analysis of group behavior. This approach touches another aspect, not well studied yet, where evolution of the whole group is being predicted, i.e. which event will be next in group lifetime.

6

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 7: Different Approaches to Community Evolution Prediction in Blogosphere

PREDICTION OF THE GROUP EVOLUTION.

What is a group?Set of vertices which communicate to each other more frequently than with vertices outside of a group

A new method for future event prediction has been developed - based on stable group changes identification algorithm (SGCI) has been developed

Prediction in this method is being made based on previous events in group lifetime extracted by SGCI group profile described by group size, cohesion,

leadership and density 7

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 8: Different Approaches to Community Evolution Prediction in Blogosphere

METHODS OF EVENTS IDENTIFICATION IN GROUP EVOLUTION

8

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 9: Different Approaches to Community Evolution Prediction in Blogosphere

SGCI ALGORITHM

Stable group changes identification Step 1. Identification of fugitive groups in

the separate time frames. Whole network is divided into time frames In each time frame the method of finding

communities in network is applied. Step 2. Identification of group continuation –

assigning transitions between groups in neighboring time steps. After extracting communities in time frames: The communities from neighboring time frames

are matched and algorithm assigns transitions between them (from group in time frame t to group in time frame t+1)

9

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 10: Different Approaches to Community Evolution Prediction in Blogosphere

SGCI ALGORITHM

Algorithm for stable group changes identification

Step 1. Identification of fugitive groups in the separate time frames. Whole network is divided into time frames In each time frame the method of finding

communities in network is applied. Step 2. Identification of group continuation –

assigning transitions between groups in neighboring time steps. After extracting communities in time frames: The communities from neighboring time frames

are matched and algorithm assigns transitions between them (from group in time frame t to group in time frame t+1)

10

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

For each pair of non-empty groups A,B from neighboring time slots we will calculate:• MJ (- Modified Jaccard Measure) • ds (- difference in size)

If MJ(A,B) is above a defined threshold and ds(A,B) between these groups is no more than specified, then the algorithm make transition between these groups.

Page 11: Different Approaches to Community Evolution Prediction in Blogosphere

SGCI ALGORITHM

Step 3. Separation of the stable groups (lasting for at least required subsequent time steps). In this step, the stable groups are retrieved.

Step 4. Identification of types of group changes. Assigning events describing the change of the state of the group to the transitions.Each transition between stable groups from neighboring time frames. We can define some types of group changes (A

and B are the groups from the first and the second time transitions accordingly). sh and dh are some thresholds.

11

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 12: Different Approaches to Community Evolution Prediction in Blogosphere

SGCI ALGORITHM addition - when a small group attaches to a large one:

deletion - when a small group detaches from a large

one:

merge - many groups in one time frame form a new

larger group in the next time frame. split – group divides into some smaller groups in next

time frame.

split_merge - occurs when a group divides into at least 2 groups in the next time frame and one of this groups from next time frame is a result of merging with another from a previous time frame.

constancy - simple transition without significant change of the group size:

change size – simple transition with the change of the group size:

decay - group does not exist in next time frame.

12

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

dh

dh

Page 13: Different Approaches to Community Evolution Prediction in Blogosphere

SGCI ALGORITHM For a given group it is possible to match more

than one event from this group to groups in the next time frame. Some events can coexist with other ones but some of them cannot.

Constancy event, can’t coexist with change size, merge or split event,

Constancy event, can coexist with addition or deletion events.

The addition and the deletion events can coexist with each event type, except the decay event.The decay event is always a single event for the group.

13

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 14: Different Approaches to Community Evolution Prediction in Blogosphere

GED: GROUP EVOLUTION DISCOVERY

For GED method we will calculate inclusion measure. It allows to evaluate the inclusion of one group in another. The inclusion of group G1 in group G2 is:

NIG1(x) – the importance of the node x in group G1.The GED method takes into account both the quantity and quality of the group members.

* Quantity can be expressed by any user importance measure e.g. centrality degree, betweenness degree, page rank, social position etc.

14

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

group quality

group quantity*

Page 15: Different Approaches to Community Evolution Prediction in Blogosphere

PREDICTING GROUP EVOLUTION IN THE SOCIAL NETWORK

15

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 16: Different Approaches to Community Evolution Prediction in Blogosphere

PREDICTING GROUP EVOLUTION USING SGCI RESULTS

This approach for prediction future events of groups employs classifier.

Structure:sequences of 3 states of groups (present time and two previous times)

16

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 17: Different Approaches to Community Evolution Prediction in Blogosphere

PREDICTING GROUP EVOLUTION USING SGCI RESULTS

Measures for the state of each group: leadership - measure describing centralization in

graph or group (the largest value is for star network)

d - max means maximum value of degree in groupn - number of nodes in group.

density - measure expressing how many connections between nodes are present in network in relation to all possible connections between them [16]

where a(i,j) =1 when there is connection from node i to node j

17

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 18: Different Approaches to Community Evolution Prediction in Blogosphere

PREDICTING GROUP EVOLUTION USING SGCI RESULTS

Measures for the state of each group – cont.: cohesion - measure characterizing strength of

connections inside group in relation to connections outside group (from group members)

where w is function assigning weight between nodes, G is group, n - number of nodes in group and N - number of nodes in network

group size - number of nodes in group18

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 19: Different Approaches to Community Evolution Prediction in Blogosphere

PREDICTING GROUP EVOLUTION USING SGCI RESULTS Described sequence of group states is an input

for classifier. The predicted variable is the dominating next event for the last group in a sequence.

19

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 20: Different Approaches to Community Evolution Prediction in Blogosphere

PREDICTING GROUP EVOLUTION USING SGCI RESULTS

Dominating event - one of events assigned for a given group. The event with the highest priority among the assigned events is chosen. We use the following order of events (from the

highest priority to the lowest one): constancy, change size, split, merge, addition, deletion, split_merge, decay.

20

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 21: Different Approaches to Community Evolution Prediction in Blogosphere

PREDICTING GROUP EVOLUTION USING SGCI RESULTS The group Gn,1 has two assigned events:

change size and addition, so the dominating event for group Gn,1 is change size because this event has higher priority.

21

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 22: Different Approaches to Community Evolution Prediction in Blogosphere

PREDICTING GROUP EVOLUTION USING GED RESULTS

The idea is using a simple sequence as an input for the classifier: preceding groups profiles and events.The learnt model will be able to produce very good results even for simple classifiers

The sequences of groups sizes and events between time frames can be extracted from the GED results. For each event - four group profiles in four previous

time frames together with three associated events are identified as the input for the classification model, separately for each group.

A single group in a given time frame (Tn) is a case (instance) for classification, for which its event TnTn+1 is being predicted.

22

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 23: Different Approaches to Community Evolution Prediction in Blogosphere

PREDICTING GROUP EVOLUTION USING GED RESULTS The sequence presented in Figure 2 is used as an input

for classification.

The first part of the sequence is used as input features (variables): the group profiles per timeframe and the event types between them.

The goal of classification is to predict (classify) Event TnTn+1 type – out of the six possible classes:growing, continuing, shrinking, dissolving, and splitting. Forming was excluded since it can only start the sequence.

23

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 24: Different Approaches to Community Evolution Prediction in Blogosphere

Dataset description:

DATASET AND EXPERIMENT SETUP

24

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Data from www.salon24.plwhich contains

many blogs (mainly political)

Data from www.salon24.plwhich contains

many blogs (mainly political)

26,722 users26,722 users

285,532 posts285,532 posts

4,173,457 comments4,173,457 comments

For tests we will use half of the data

set:04/04/2010 – 31/03/2012

Each time frame lasts 7 days

Each time frame lasts 7 days

Time frames overlap each other

by 4 days

Time frames overlap each other

by 4 days

Yields a total of 182 time framesYields a total of 182 time frames

Page 25: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

Group extraction:After separation of time frames the groups were extracted in each of the time frames.

Done using CPM method (CPMd version) from CFinder tool (http://www.cfinder.org/) for k=5. CFinder is a tool for finding and visualizing

overlapping dense groups of nodes in networks, based on the Clique Percolation Method (CPM)

25

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 26: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

Group sizes As we can notice in Figure 3 there are many small groups and groups with size 5 outnumber other ones.

26

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 27: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

Experiment setup: SGCI method experiments were conducted using

following parameters:MJ=0.5, ds=50,sh=10 and dh=0.05.

GED method was run on the dataset with all combination of GED parameters from the set:Quantity: {50%, 60%, 70%, 80%, 90%, 100%}.Quality (node importance): social position measure wasutilized (measure similar to page rank).

27

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Reminder:

Reminder:

group qualitygroup quantity*

Page 28: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

Experiment setup: To describe the group profile, its size, density,

cohesion and leadership were used Seven different classifiers were utilized with

default settings All classifiers were utilized for both approaches:

SGCI and GED

28

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 29: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

Classifiers – reminder: What is a classifier?

Adaptive system that learns to perform the best action given its input - identifying to which of a set of categories (sub-populations) a new observation belongs.

What Is Multiclass Classification? Each training point belongs to one of N different classes. The goal is to construct a function which, given a new data point, will correctly predict the class to which the new point belongs

29

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 30: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

Multi-Class Classification: direct approaches:

Nearest Neighbor Generative approach & Naïve Bayes Linear classification:

Multi-label classification:

30

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

• Is it eatable?

• Is it sweet?

• Is it a fruit?

• Is it a banana?

•Is it a banana?

•Is it an apple?

•Is it an orange?

•Is it a pineapple?

•Is it a banana?

•Is it yellow?

•Is it sweet?

•Is it round?

Nested/ Hierarchical Exclusive/ Multi-class General/Structured

Page 31: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

Multi-Class Classification – real world examples:

31

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Object recognition

100100

Automated protein classification

5050

300-600 300-600

Digit recognition

1010

Phoneme recognition

Page 32: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

A Simple Idea — One-vs-All Classification Pick a good technique for building binary

classifiers. Build N different binary classifiers. For the i’th classifier, let the positive examples be all the points in class i, and let the negative examples be all the points not in class i. Let fi be the i’th classifier. Classify with

single classifier is trained per class to distinguish that class from all other classes

32

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 33: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

33

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

Page 34: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

34

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

A

Page 35: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

35

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

A

<0.7

DENSITY

Page 36: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

36

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

A

<0.7

DENSITY

<0.2

Page 37: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

37

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

A

<0.7

DENSITY

<0.2

B

Page 38: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

38

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

A

<0.7

DENSITY

COHISION

>0.2<0.2

B

Page 39: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

39

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

A

<0.7

DENSITY

<0.2

COHISION

>0.2

B

>0.8

Page 40: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

40

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

A

<0.7

DENSITY

<0.2

COHISION

>0.2

B

>0.8

B

Page 41: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

41

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

A

<0.7

DENSITY

<0.2

COHISION

>0.2

B

>0.8

B

<0.8

GROUP SIZE

Page 42: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

42

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

A

<0.7

DENSITY

<0.2

COHISION

>0.2

B

>0.8

B

<0.8

GROUP SIZE

<10

Page 43: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

43

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

A

<0.7

DENSITY

<0.2

COHISION

>0.2

B

>0.8

B

<0.8

GROUP SIZE

<10

C

Page 44: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

44

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

A

<0.7

DENSITY

<0.2

COHISION

>0.2

B

>0.8

B

<0.8

GROUP SIZE

<10

C

>10

Page 45: Different Approaches to Community Evolution Prediction in Blogosphere

DATASET AND EXPERIMENT SETUP

45

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Leadership

Density

Cohesion

Group size

Group

0.850.430.285A

0,710.480.306A

0,650.890.9913B

0.970.530.6218A

0.470.560.754C

0.210.120.935B

0.350.390.926B

0.370.420.489C

0.880.180.219A

0.460.720.846B

0.120.850.8714B

0.280.270.3512A

LEADERSHIP

>0.7

A

<0.7

DENSITY

<0.2

COHISION

>0.2

B

>0.8

B

<0.8

GROUP SIZE

<10

C

>10

A

Page 46: Different Approaches to Community Evolution Prediction in Blogosphere

EXPERIMENTS Predicting Group Evolution Using SGCI Results

The measure selected is F-measure (AKA F1-measure) – represents accuracy of result

program's precision  =

program's recall =

The F measure is:

46

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 47: Different Approaches to Community Evolution Prediction in Blogosphere

EXPERIMENTS Predicting Group Evolution Using SGCI Results

Results of prediction events for different classifiers:Tree classifiers (J48, Random Forest and Simple CART) and Decision Table (rule classifier) achieved the best results. Notably worse results are for Naive Bayes and IBk.

47

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 48: Different Approaches to Community Evolution Prediction in Blogosphere

EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont.

Results of classification for 3 tree classifiers. One can see that results for these 3 classifiers are very similar - the biggest difference is for the decay event which seemed harder to classify. Other events are well classified.

Results of event classification for decision tree classifiers

48

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 49: Different Approaches to Community Evolution Prediction in Blogosphere

Predicting Group Evolution Using SGCI Results – cont. Results of prediction obtained by probabilistic classifiers.

BayesNet achieved quite good results, but NaiveBayes much worse.

Explenatuon: this classifier is based on assumption of independence features used to classification task. This requirement is not met because some values of one measure are correlated with values of another measure e.g. generally density has higher values for smaller groups.

Results of event classification for probabilistic classifiers

EXPERIMENTS

49

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 50: Different Approaches to Community Evolution Prediction in Blogosphere

EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont.

Here we can see results for other tested classifiers. Decay event is significantly worse classified than other

events (as seen before). The Ibk classifier accomplished worse results of prediction

than DecisionTable one. For Ibk classifier the hardest event to classify seemed to be

constancy.

Results of event classification for other classifiers

50

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 51: Different Approaches to Community Evolution Prediction in Blogosphere

EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont.

Most popular event is the addition event (there is significantly more events of this type than other types of events).This is why this event is very well classified for each tested classifier.

The percentage of events in dataset.

51

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 52: Different Approaches to Community Evolution Prediction in Blogosphere

EXPERIMENTS Predicting Group Evolution Using GED Results

F-measure comparison for all event types (classes) and all classifiers.

3 tree classifiers achieved the best results (the worst F-measure value is 0.57 for continuing)

From the rest the Decision Table also achieved quite good results.

F-measure for each event type (class) and each classifier.

52

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 53: Different Approaches to Community Evolution Prediction in Blogosphere

EXPERIMENTS Predicting Group Evolution Using GED Results – cont.

Results of event classification for decision tree classifiers 53

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 54: Different Approaches to Community Evolution Prediction in Blogosphere

EXPERIMENTS Predicting Group Evolution Using GED Results – cont.

Results of event classification for probabilistic classifiers54

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 55: Different Approaches to Community Evolution Prediction in Blogosphere

EXPERIMENTS Predicting Group Evolution Using GED Results – cont.

Results of event classification for probabilistic classifiers55

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 56: Different Approaches to Community Evolution Prediction in Blogosphere

EXPERIMENTS Predicting Group Evolution Using GED Results – cont.

Results of event classification for other classifiers56

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 57: Different Approaches to Community Evolution Prediction in Blogosphere

EXPERIMENTS Predicting Group Evolution Using GED Results – cont.

each classifier achieves the best results for splitting, merging and dissolving events and the worst for continuing, shrinking and growing.

Why?

57

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

because of uneven distribution of different event types instances

Page 58: Different Approaches to Community Evolution Prediction in Blogosphere

EXPERIMENTS The number of splitting events is much higher than for

the rest of events probably because the time frame size is too short for the

most communities and they continuously splits and merge as service users migrates from one topic to another.

For the merging and dissolving events, most classifiers are able to produce very good results, despite the fact that they constitute only a small fraction of all events.

58

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 59: Different Approaches to Community Evolution Prediction in Blogosphere

DISCUSSION, CONCLUSIONS AND FUTURE WORK The new method for future event prediction based on SGCI

algorithm presented with comparison to the method based on GED algorithm.

A high level of prediction quality was obtained using both presented methods

59

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 60: Different Approaches to Community Evolution Prediction in Blogosphere

DISCUSSION, CONCLUSIONS AND FUTURE WORK The best results: In the case of both methods, best results

were obtained using different decision tree classifiers. The worst results:

In the SGCI method - using Naive Bayesian classifier In GED.- Naive Bayesian and Bayes Network classifiers.

60

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here

Page 61: Different Approaches to Community Evolution Prediction in Blogosphere

THE END

Diff

ere

nt A

ppro

ach

es to

Com

munity

Evolu

tion P

redictio

n in

Blo

gosp

here61