enkh-amgalan baatarjav jedsada chartree thiraphat meesumrarn university of north texas

19
Group Recommendation System for Facebook Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Upload: albert-oswald-wright

Post on 29-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Group Recommendation System for Facebook

Enkh-Amgalan BaatarjavJedsada ChartreeThiraphat Meesumrarn

University of North Texas

Page 2: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Overview

Evolution of Communication

Online Social Networking (OSN)

Architecture Profile feature Profile Analysis Similarity inference Clustering coefficient Decision tree

Conclusion

Traditional medium of communication Mail, telephone, fax,

E-mail, etc. Key to successful

communication Sharing common

value

Page 3: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Online Social Networking

User-driven content Overwhelming number of groups Finding suitable groups Sharing a common value Improving online social network

Page 4: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Architecture

Profile feature extraction

Classification engine Clustering Building decision

tree Group

recommendation

Page 5: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Profile Feature

Group profile defined by profile features of users Time Zone - Age Gender - Relationship Status Political View - Activities Interest - Music TV shows - Movies Books - Affiliations Note counts - Wall counts Number of Fiends

Page 6: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Profile AnalysisSubtype Size Description

G1 Friends 12 Friends group for one is going abroad

G2 Politic 169 Campaign for running student body

G3 Languages 10 Spanish learners

G4 Beliefs & causes 46 Campaign for homecoming king and queen

G5 Beauty 12 Wearing same pants everyday

G6 Beliefs & causes 41 Friends group

G7 Food & Drink 57 Lovers of Asian food restaurant

G8 Religion/Spirituality 42 Learning about God

G9 Age 22 Friends group

G10 Activities 40 People who play clarinets

G11 Sexuality 319 Against gay marriage

G12 Beliefs & causes 86 Friends group

G13 Sexuality 36 People who thinks fishnet is fetish

G14 Activities 179 People who dislike early morning classes

G15 Politics 195 Group for democrats

G16 Hobbies & Crafts 33 People who enjoys Half-Life (PC game)

G17 Politics 281 Not a Bush fan

G1

G2

G3

G4

G5

G6

G7

G8

G9

G10

G11

G12

G13

G14

G15

G16

G170%

20%

40%

60%

80%

Hidden 15-19 20-24 25-29 30-36

Perc

enta

ge o

f M

em

bers

G1

G2

G3

G4

G5

G6

G7

G8

G9

G10

G11

G12

G13

G14

G15

G16

G170%

20%

40%

60%

80%

100%

Male Female

Perc

enta

ge o

f M

em

bers

G1

G2

G3

G4

G5

G6

G7

G8

G9

G10

G11

G12

G13

G14

G15

G16

G17

0%

20%

40%

60%

Hidden VL Li M C VC A Ln

Groups

Perc

enta

ge o

f M

em

bers

Page 7: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Similarity Inference

Hierarchical clustering Normalizing data [0,

1] Computing distance

matrix to calculate similarity among all pairs of members (a)

Finding average distance between all pairs in given two clusters s and r

N

isrrs xxd

1

2)(

r sn

i

n

jsjri

sr

xxdistnn

srd1 1

),(1

),(

(a)

(b)

Page 8: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Clustering Coefficient

- Ri is the normalized Euclidean distance from the center of member i

- Nk is the normalized number of members within distance k from the center

i

R

R

NC i

jj

ii r

rR

maxarg

M

nN kk

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

Ri

C

RX

Cmax

Page 9: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Decision Tree

Decision tree algorithm, based on binary recursive partitioning

Splitting rules Gini, Twoing, Deviance

Tree optimization Cross-validation (computation intense)

Page 10: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

After Data Cleaning

Fair representation of group profile Groups must have at least 10

members Reduction

Users from 1,580 to 1,023 Group from 17 to 7

Group Size

1 274

2 226

3 159

4 151

5 133

6 67

7 13

Page 11: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Result 1

Data set Training: 75% Testing: 25%

Accuracy calculation 25 fold test

Accuracy 27%

Page 12: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Statistical Analysis: Mean

Page 13: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Statistical Analysis: STD

Page 14: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Adjustment in Feature Selection

Feature score calculation Using group profile: FSGP

Using group closeness: FSGC

Combination of FSGP and FSGC: FSPC

)( gff GPSTDFSGP

Page 15: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

FSGP vs Accuracy

Page 16: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

FSGC vs Accuracy

Page 17: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

FSPC vs Accuracy

Page 18: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Result 2

Feature Score Calculation Accuracy (%)

Group–Profile Feature 24.47

STD of means 25.04

Mean of STDs 21.75

Page 19: Enkh-Amgalan Baatarjav Jedsada Chartree Thiraphat Meesumrarn University of North Texas

Conclusion

Improving QoS of Online Social Networking Architecture

Hierarchical clustering Threshold value to reduce noise Decision tree

Result poor performance cause Decision tree: decision boundaries || to coord. Data overlapping More work on data cleaning

Feature reduction From 12 to 2