different approaches to community evolution prediction in blogosphere
DESCRIPTION
Different Approaches to Community Evolution Prediction in Blogosphere. Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław Kazienko, Jarosław Kolak. Outline:. Introduction and motivation Methods of events identification in group evolution: SCGI GED - PowerPoint PPT PresentationTRANSCRIPT
DIFFERENT APPROACHES TO COMMUNITY EVOLUTIONPREDICTION IN BLOGOSPHERE
Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław Kazienko, Jarosław Kolak
OUTLINE:
Introduction and motivation Methods of events identification in group
evolution: SCGI GED
Predicting group evolution in the social network
Dataset and experiment setup Classifiers – reminder For each method we will compare results
between different classifiers conclusion
2
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
GENERAL IDEA
Predicting the future direction of community evolution allows to determine which characteristics describing communities have importance from the point of view of their future behavior.
3
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
MOTIVATION
Making decision concerning investing in contact with members of a given community and carrying out actions to achieve a key position in it
Allows to determine effective ways of forming opinions.
Allows to protect group participants against such activities.
4
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
INTRODUCTION – PREDICTION
Link prediction (Best investigated) link prediction problem: predicting the existence of a link (relation) between two nodes (users) within a social network. Liben-Nowell - focused on path and common
neighbours between pair of nodes Lichtenwalter consider degrees and mutual
information between them.
5
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
INTRODUCTION – PREDICTION Link sign prediction - Sign in this context
means that predicted relation between users may be positive or negative Symeonidis looked at paths between the node pair
and use the notion of similarity to predict the sign Leskovec use degree and mutual information between
pair of nodes for link prediction and profits from the theory of balance and status to predict the link sign.
Richter and Wai-Ho faced the very important task of churn prediction (the number of individuals moving out of a collective over a specific period of time).
Richter presented a new approach and tried to predict churn based on analysis of group behavior. This approach touches another aspect, not well studied yet, where evolution of the whole group is being predicted, i.e. which event will be next in group lifetime.
6
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
PREDICTION OF THE GROUP EVOLUTION.
What is a group?Set of vertices which communicate to each other more frequently than with vertices outside of a group
A new method for future event prediction has been developed - based on stable group changes identification algorithm (SGCI) has been developed
Prediction in this method is being made based on previous events in group lifetime extracted by SGCI group profile described by group size, cohesion,
leadership and density 7
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
METHODS OF EVENTS IDENTIFICATION IN GROUP EVOLUTION
8
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
SGCI ALGORITHM
Stable group changes identification Step 1. Identification of fugitive groups in
the separate time frames. Whole network is divided into time frames In each time frame the method of finding
communities in network is applied. Step 2. Identification of group continuation –
assigning transitions between groups in neighboring time steps. After extracting communities in time frames: The communities from neighboring time frames
are matched and algorithm assigns transitions between them (from group in time frame t to group in time frame t+1)
9
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
SGCI ALGORITHM
Algorithm for stable group changes identification
Step 1. Identification of fugitive groups in the separate time frames. Whole network is divided into time frames In each time frame the method of finding
communities in network is applied. Step 2. Identification of group continuation –
assigning transitions between groups in neighboring time steps. After extracting communities in time frames: The communities from neighboring time frames
are matched and algorithm assigns transitions between them (from group in time frame t to group in time frame t+1)
10
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
For each pair of non-empty groups A,B from neighboring time slots we will calculate:• MJ (- Modified Jaccard Measure) • ds (- difference in size)
If MJ(A,B) is above a defined threshold and ds(A,B) between these groups is no more than specified, then the algorithm make transition between these groups.
SGCI ALGORITHM
Step 3. Separation of the stable groups (lasting for at least required subsequent time steps). In this step, the stable groups are retrieved.
Step 4. Identification of types of group changes. Assigning events describing the change of the state of the group to the transitions.Each transition between stable groups from neighboring time frames. We can define some types of group changes (A
and B are the groups from the first and the second time transitions accordingly). sh and dh are some thresholds.
11
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
SGCI ALGORITHM addition - when a small group attaches to a large one:
deletion - when a small group detaches from a large
one:
merge - many groups in one time frame form a new
larger group in the next time frame. split – group divides into some smaller groups in next
time frame.
split_merge - occurs when a group divides into at least 2 groups in the next time frame and one of this groups from next time frame is a result of merging with another from a previous time frame.
constancy - simple transition without significant change of the group size:
change size – simple transition with the change of the group size:
decay - group does not exist in next time frame.
12
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
dh
dh
SGCI ALGORITHM For a given group it is possible to match more
than one event from this group to groups in the next time frame. Some events can coexist with other ones but some of them cannot.
Constancy event, can’t coexist with change size, merge or split event,
Constancy event, can coexist with addition or deletion events.
The addition and the deletion events can coexist with each event type, except the decay event.The decay event is always a single event for the group.
13
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
GED: GROUP EVOLUTION DISCOVERY
For GED method we will calculate inclusion measure. It allows to evaluate the inclusion of one group in another. The inclusion of group G1 in group G2 is:
NIG1(x) – the importance of the node x in group G1.The GED method takes into account both the quantity and quality of the group members.
* Quantity can be expressed by any user importance measure e.g. centrality degree, betweenness degree, page rank, social position etc.
14
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
group quality
group quantity*
PREDICTING GROUP EVOLUTION IN THE SOCIAL NETWORK
15
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
PREDICTING GROUP EVOLUTION USING SGCI RESULTS
This approach for prediction future events of groups employs classifier.
Structure:sequences of 3 states of groups (present time and two previous times)
16
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
PREDICTING GROUP EVOLUTION USING SGCI RESULTS
Measures for the state of each group: leadership - measure describing centralization in
graph or group (the largest value is for star network)
d - max means maximum value of degree in groupn - number of nodes in group.
density - measure expressing how many connections between nodes are present in network in relation to all possible connections between them [16]
where a(i,j) =1 when there is connection from node i to node j
17
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
PREDICTING GROUP EVOLUTION USING SGCI RESULTS
Measures for the state of each group – cont.: cohesion - measure characterizing strength of
connections inside group in relation to connections outside group (from group members)
where w is function assigning weight between nodes, G is group, n - number of nodes in group and N - number of nodes in network
group size - number of nodes in group18
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
PREDICTING GROUP EVOLUTION USING SGCI RESULTS Described sequence of group states is an input
for classifier. The predicted variable is the dominating next event for the last group in a sequence.
19
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
PREDICTING GROUP EVOLUTION USING SGCI RESULTS
Dominating event - one of events assigned for a given group. The event with the highest priority among the assigned events is chosen. We use the following order of events (from the
highest priority to the lowest one): constancy, change size, split, merge, addition, deletion, split_merge, decay.
20
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
PREDICTING GROUP EVOLUTION USING SGCI RESULTS The group Gn,1 has two assigned events:
change size and addition, so the dominating event for group Gn,1 is change size because this event has higher priority.
21
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
PREDICTING GROUP EVOLUTION USING GED RESULTS
The idea is using a simple sequence as an input for the classifier: preceding groups profiles and events.The learnt model will be able to produce very good results even for simple classifiers
The sequences of groups sizes and events between time frames can be extracted from the GED results. For each event - four group profiles in four previous
time frames together with three associated events are identified as the input for the classification model, separately for each group.
A single group in a given time frame (Tn) is a case (instance) for classification, for which its event TnTn+1 is being predicted.
22
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
PREDICTING GROUP EVOLUTION USING GED RESULTS The sequence presented in Figure 2 is used as an input
for classification.
The first part of the sequence is used as input features (variables): the group profiles per timeframe and the event types between them.
The goal of classification is to predict (classify) Event TnTn+1 type – out of the six possible classes:growing, continuing, shrinking, dissolving, and splitting. Forming was excluded since it can only start the sequence.
23
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Dataset description:
DATASET AND EXPERIMENT SETUP
24
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Data from www.salon24.plwhich contains
many blogs (mainly political)
Data from www.salon24.plwhich contains
many blogs (mainly political)
26,722 users26,722 users
285,532 posts285,532 posts
4,173,457 comments4,173,457 comments
For tests we will use half of the data
set:04/04/2010 – 31/03/2012
Each time frame lasts 7 days
Each time frame lasts 7 days
Time frames overlap each other
by 4 days
Time frames overlap each other
by 4 days
Yields a total of 182 time framesYields a total of 182 time frames
DATASET AND EXPERIMENT SETUP
Group extraction:After separation of time frames the groups were extracted in each of the time frames.
Done using CPM method (CPMd version) from CFinder tool (http://www.cfinder.org/) for k=5. CFinder is a tool for finding and visualizing
overlapping dense groups of nodes in networks, based on the Clique Percolation Method (CPM)
25
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
DATASET AND EXPERIMENT SETUP
Group sizes As we can notice in Figure 3 there are many small groups and groups with size 5 outnumber other ones.
26
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
DATASET AND EXPERIMENT SETUP
Experiment setup: SGCI method experiments were conducted using
following parameters:MJ=0.5, ds=50,sh=10 and dh=0.05.
GED method was run on the dataset with all combination of GED parameters from the set:Quantity: {50%, 60%, 70%, 80%, 90%, 100%}.Quality (node importance): social position measure wasutilized (measure similar to page rank).
27
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Reminder:
Reminder:
group qualitygroup quantity*
DATASET AND EXPERIMENT SETUP
Experiment setup: To describe the group profile, its size, density,
cohesion and leadership were used Seven different classifiers were utilized with
default settings All classifiers were utilized for both approaches:
SGCI and GED
28
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
DATASET AND EXPERIMENT SETUP
Classifiers – reminder: What is a classifier?
Adaptive system that learns to perform the best action given its input - identifying to which of a set of categories (sub-populations) a new observation belongs.
What Is Multiclass Classification? Each training point belongs to one of N different classes. The goal is to construct a function which, given a new data point, will correctly predict the class to which the new point belongs
29
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
DATASET AND EXPERIMENT SETUP
Multi-Class Classification: direct approaches:
Nearest Neighbor Generative approach & Naïve Bayes Linear classification:
Multi-label classification:
30
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
• Is it eatable?
• Is it sweet?
• Is it a fruit?
• Is it a banana?
•Is it a banana?
•Is it an apple?
•Is it an orange?
•Is it a pineapple?
•Is it a banana?
•Is it yellow?
•Is it sweet?
•Is it round?
Nested/ Hierarchical Exclusive/ Multi-class General/Structured
DATASET AND EXPERIMENT SETUP
Multi-Class Classification – real world examples:
31
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Object recognition
100100
Automated protein classification
5050
300-600 300-600
Digit recognition
1010
Phoneme recognition
DATASET AND EXPERIMENT SETUP
A Simple Idea — One-vs-All Classification Pick a good technique for building binary
classifiers. Build N different binary classifiers. For the i’th classifier, let the positive examples be all the points in class i, and let the negative examples be all the points not in class i. Let fi be the i’th classifier. Classify with
single classifier is trained per class to distinguish that class from all other classes
32
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
DATASET AND EXPERIMENT SETUP
33
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
DATASET AND EXPERIMENT SETUP
34
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
A
DATASET AND EXPERIMENT SETUP
35
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
A
<0.7
DENSITY
DATASET AND EXPERIMENT SETUP
36
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
A
<0.7
DENSITY
<0.2
DATASET AND EXPERIMENT SETUP
37
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
A
<0.7
DENSITY
<0.2
B
DATASET AND EXPERIMENT SETUP
38
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
A
<0.7
DENSITY
COHISION
>0.2<0.2
B
DATASET AND EXPERIMENT SETUP
39
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
A
<0.7
DENSITY
<0.2
COHISION
>0.2
B
>0.8
DATASET AND EXPERIMENT SETUP
40
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
A
<0.7
DENSITY
<0.2
COHISION
>0.2
B
>0.8
B
DATASET AND EXPERIMENT SETUP
41
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
A
<0.7
DENSITY
<0.2
COHISION
>0.2
B
>0.8
B
<0.8
GROUP SIZE
DATASET AND EXPERIMENT SETUP
42
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
A
<0.7
DENSITY
<0.2
COHISION
>0.2
B
>0.8
B
<0.8
GROUP SIZE
<10
DATASET AND EXPERIMENT SETUP
43
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
A
<0.7
DENSITY
<0.2
COHISION
>0.2
B
>0.8
B
<0.8
GROUP SIZE
<10
C
DATASET AND EXPERIMENT SETUP
44
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
A
<0.7
DENSITY
<0.2
COHISION
>0.2
B
>0.8
B
<0.8
GROUP SIZE
<10
C
>10
DATASET AND EXPERIMENT SETUP
45
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Leadership
Density
Cohesion
Group size
Group
0.850.430.285A
0,710.480.306A
0,650.890.9913B
0.970.530.6218A
0.470.560.754C
0.210.120.935B
0.350.390.926B
0.370.420.489C
0.880.180.219A
0.460.720.846B
0.120.850.8714B
0.280.270.3512A
LEADERSHIP
>0.7
A
<0.7
DENSITY
<0.2
COHISION
>0.2
B
>0.8
B
<0.8
GROUP SIZE
<10
C
>10
A
EXPERIMENTS Predicting Group Evolution Using SGCI Results
The measure selected is F-measure (AKA F1-measure) – represents accuracy of result
program's precision =
program's recall =
The F measure is:
46
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
EXPERIMENTS Predicting Group Evolution Using SGCI Results
Results of prediction events for different classifiers:Tree classifiers (J48, Random Forest and Simple CART) and Decision Table (rule classifier) achieved the best results. Notably worse results are for Naive Bayes and IBk.
47
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont.
Results of classification for 3 tree classifiers. One can see that results for these 3 classifiers are very similar - the biggest difference is for the decay event which seemed harder to classify. Other events are well classified.
Results of event classification for decision tree classifiers
48
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
Predicting Group Evolution Using SGCI Results – cont. Results of prediction obtained by probabilistic classifiers.
BayesNet achieved quite good results, but NaiveBayes much worse.
Explenatuon: this classifier is based on assumption of independence features used to classification task. This requirement is not met because some values of one measure are correlated with values of another measure e.g. generally density has higher values for smaller groups.
Results of event classification for probabilistic classifiers
EXPERIMENTS
49
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont.
Here we can see results for other tested classifiers. Decay event is significantly worse classified than other
events (as seen before). The Ibk classifier accomplished worse results of prediction
than DecisionTable one. For Ibk classifier the hardest event to classify seemed to be
constancy.
Results of event classification for other classifiers
50
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont.
Most popular event is the addition event (there is significantly more events of this type than other types of events).This is why this event is very well classified for each tested classifier.
The percentage of events in dataset.
51
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
EXPERIMENTS Predicting Group Evolution Using GED Results
F-measure comparison for all event types (classes) and all classifiers.
3 tree classifiers achieved the best results (the worst F-measure value is 0.57 for continuing)
From the rest the Decision Table also achieved quite good results.
F-measure for each event type (class) and each classifier.
52
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
EXPERIMENTS Predicting Group Evolution Using GED Results – cont.
Results of event classification for decision tree classifiers 53
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
EXPERIMENTS Predicting Group Evolution Using GED Results – cont.
Results of event classification for probabilistic classifiers54
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
EXPERIMENTS Predicting Group Evolution Using GED Results – cont.
Results of event classification for probabilistic classifiers55
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
EXPERIMENTS Predicting Group Evolution Using GED Results – cont.
Results of event classification for other classifiers56
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
EXPERIMENTS Predicting Group Evolution Using GED Results – cont.
each classifier achieves the best results for splitting, merging and dissolving events and the worst for continuing, shrinking and growing.
Why?
57
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
because of uneven distribution of different event types instances
EXPERIMENTS The number of splitting events is much higher than for
the rest of events probably because the time frame size is too short for the
most communities and they continuously splits and merge as service users migrates from one topic to another.
For the merging and dissolving events, most classifiers are able to produce very good results, despite the fact that they constitute only a small fraction of all events.
58
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
DISCUSSION, CONCLUSIONS AND FUTURE WORK The new method for future event prediction based on SGCI
algorithm presented with comparison to the method based on GED algorithm.
A high level of prediction quality was obtained using both presented methods
59
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
DISCUSSION, CONCLUSIONS AND FUTURE WORK The best results: In the case of both methods, best results
were obtained using different decision tree classifiers. The worst results:
In the SGCI method - using Naive Bayesian classifier In GED.- Naive Bayesian and Bayes Network classifiers.
60
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here
THE END
Diff
ere
nt A
ppro
ach
es to
Com
munity
Evolu
tion P
redictio
n in
Blo
gosp
here61