collabrate com2012 rashed

23
Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. I5-DR-0312-1 Khaled Rashed Cristina Balasoiu Ralf Klamma Deutschen Akademischen Austauschdienstes CollaborateCo m2012 Robust Expert Ranking in Online Communities - Fighting Sybil Attacks Khaled A. N. Rashed, Cristina Balasoiu, Ralf Klamma RWTH Aachen University Advanced Community Information Systems (ACIS) {rashed|balsoiu|klamma}@dbis.rwth-aachen.de 8th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing October 14–17, 2012 Pittsburgh, Pennsylvania, United States

Upload: khaled-rashed

Post on 12-Jun-2015

162 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-1

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012Robust Expert Ranking in Online Communities - Fighting Sybil Attacks

Khaled A. N. Rashed, Cristina Balasoiu, Ralf KlammaRWTH Aachen University

Advanced Community Information Systems (ACIS){rashed|balsoiu|klamma}@dbis.rwth-aachen.de

8th IEEE International Conference on Collaborative Computing: Networking, Applications and WorksharingOctober 14–17, 2012 Pittsburgh, Pennsylvania, United States

Page 2: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-2

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Responsive Open

Community Information

Systems

Community Visualization

and Simulation

Community Analytics

Community Support

Web A

nalytics

Web

Eng

inee

ring

Advanced Community Information Systems (ACIS)

Requirements Engineering

Page 3: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-3

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Agenda

Introduction and motivation

Related work

Our Approach

– Expert ranking algorithm

– Robustness of the expert ranking algorithm

Evaluation

Conclusions and outlook

Page 4: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-4

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Introduction

The expert search and ranking refer to the way of finding a

group of authoritative users with special skills and knowledge

for a specific category.

The task is very important in online collaborative systems

Problems: openness and misbehaviour and

– No attention has been made to the trust and reputation of experts

Solution: Leveraging trust

Page 5: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-5

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Motivation ExamplesManipulating the truth for war

propaganda

Published as: 2004 Indian Ocean Tsunami Proved to be tidal bores, a four-day-long

government-sponsored tourist festival in China

Tidal bores presented as Indian Ocean Tsunami

Expert knowledge, analysis and witnesses are needed to identify the fake!

Published as: British soldiers abusing prisoners in Iraq

Proved to be fake by Brigadier Geoff Sheldon who said the vehicle featured in the photo had never been to Iraq

Page 6: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-6

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

A Case Study: Collaborative Fake Multimedia Detection System

Collaborative activities (rating, tagging and commenting)– Provide new means of search, retrieval and media authenticity

evaluation– Explicit ratings and tags are used for evaluating authenticity of

multimedia items– Reliability: not all of the submitted ratings are reliable– No centralized control mechanism– Vulnerability to attacks

Three types of users– Honest users– Experts– Malicious users

Page 7: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-7

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Research Questions and Goals Research questions

– How to measure users’ expertise in collaborative media sharing and

evaluating systems? and how to rank them?

– What is the implication of trust

– Robustness! how to ensure robustness of the ranking algorithm Goals

– Improve multimedia evaluation

– Reduce impacts of malicious users

Page 8: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-8

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Related Work

Probabilistic models e.g.[Tu et al.2010]

Voting models [Macdonald and Ounis 2006] [Macdonald et al.2008]

Link-based approaches PageRank [Brein and Page 1998], HITS [Kleinberg1999] and their variations. SPEAR algorithm [Noll et al. 2009]

ExpertRank [Jiao et al. 2009]

TREC enterprise track -Find the associations between candidates

and documents e.g.[Balog 2006, Balog 2007]

Machine learning algorithms e.g. [Bian and Liu 2008, Li et al. 2009]

Page 9: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-9

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Our Approach

Assumptions

– Expert users tend to have many authenticity ratings

– Correctly evaluated media are rated by users of high expertise

– Following expert users provides more benefits Expert definition

– Rates a big number of media files in an authentic way with respect to

a topic and Highly trusted by his directly connected users

– Should be trustable in evaluating multimedia

Page 10: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-10

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Expert Ranking Methods

Domain knowledge driven method– Considers tags that users assign to media files– User profile: merging tags user submitted to the media files in the

system– Similarity coefficient between the candidate profile and the tags

assigned to a specific resource – Used to reorder users who voted a media file according to the tag

profile Domain knowledge independent method

– Use the connections between users and resources to decide on the expertise of the users

– A modified version of HITS algorithm– Mutual reinforcement of users expertise and media

Page 11: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-11

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

MHITS : Expert Ranking Algorithm

MHITS: Expert ranking algorithm in online collaborative systems– Link-based approach, based on HITS algorithm– HITS– Authorities: pages that are pointed to by good pages

– Hubs: pages that points to good pages– Reinforcement between hubs and authorities

– MHITS –Users act as hubs (correctly evaluated media rated by them)– Media files act as authorities– Mutual reinforcement between users and media files– Local trust values between users are assigned– Considers the rates of the users

Page 12: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-12

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Symbol Description

a(m) Authority score

U(m) Set of users pointing to media file m

h(u) Hubness score

r(u) Rating of user u for media file m

t(u) Average trust of the direct connected users to user u

M(u) Set of media files to which user u points

Coefficient that weights the influence of the two terms, in range [0, 1]

MHITS: Expert Ranking Algorithm

)()()()(

uruhmamUu

t(u)β)(r(u)a(m)βh(u)M(u)m

1

one network for users and ratings one for users only (trust network).

Trust in range [0, 1]Ratings 0.5 for a fake vote,

1 for an authentic vote

Page 13: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-13

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Robustness of the MHITS Algorithm Compromising techniques

– Sybil attack [Douc02], Reputation theft, Whitewashing attack, etc.– Compromising the input and the output of the algorithm

Sybil attack– Fundamental problem in online collaborative systems– A malicious user creates many fake accounts (Sybils) which all

reference the user to boost his reputation (attacker’s goal is to be higher up in the rankings)

Countermeasures against Sybil attackSybilGuard [YKGF06] SybilLimit [YGKX08] SumUp [TMLS09]

Protocol type Decentralized Decentralized CentralizedAccepted Sybils per attack edge

Page 14: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-14

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

SumUp Centralized approach

– Aims to aggregate votes in a Sybil resilient manner

Key idea – adaptive vote flow technique - that appropriately assigns and adjusts link capacities in the trust graph to collect the votes for an object

New: we Integrate SumUp with the MHITS Java implementation – used own data structure based on Java Sparse Arrays

SumUp Steps

(1) Assign the source node and

number of votes per media file

(2) Levels assignment

(3) Pruning step

(4) Capacity assignment

(5) Max-flow computation – collect

votes on each resource

(6) Leverage user history to penalize

adversarial nodes

Page 15: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-15

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Integration of SumUp with MHITS

Page 16: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-16

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Evaluation Experimental Setup

– BarabasiAlbert model for generating network– 300 users– 20 media files (10 known to be fake and 10 known to be authentic)– 800 ratings– 3000 trust edges

Page 17: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-17

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Ratings Distribution

Page 18: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-18

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Evaluation Evaluation metrics:

– Precision@K

– Spearman’s rank correlation coefficient

p - Spearman’s coefficient of rank correlation -1 ≤ ps ≤ 1

di - is the different between the rank of xi and the rank of yi

n:- the number of data points in the sample (total number of observations)

ps = - 1 or 1 high degree of correlation between x any y

Ps = 0 a lack of linear association between two variables

K

TopKTopKrecision@K

'

) n(n

d ρ

n

ii

s 1

61

21

2

+1 0 -1

Perfect Positive Correlation

No Correlation Perfect Negative Correlation

Page 19: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-19

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Experimental Results I

No Sybils Results are compared with the ranking of the users according to the number of fair ratings each of them had in the system

HITS MHITS

Spearman n=15

0.87 0.93

Page 20: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-20

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Experimental Results II

10% Sybils 4 attack edges

HITS MHITS MHITS & SumUp

Spearman n=20

0.52 0.68 0.93

Page 21: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-21

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Experimental Results III

10% Sybils (one group) and 8 attack edges 20% Sybils (one group) and 24 attack edges

Precision@K

Page 22: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-22

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Further evaluation 3% 17% - Number of Sybil votes increased with respect to the

total number of fair votes – expertise ranking does not change

9 to 14 and 24 Number of attack edges was increased keeping the number of Sybil votes to 17% percent of the number of fair votes and constant number of Sybils (50)– precision does not change

17% 50% and then to 100% the number of Sybil votes Increased keeping constant the Nr of attack edges (24) and Sybils Nr.

K MHITS20%

MHITS & SumUp 20%

MHITS50%

MHITS&SumUp 50%

MHITS100%

MHITS & SumUp100%

12 0.91 0.91 0.27 0.33 0.08 0.08

15 0.93 0.93 0.33 0.40 0.06 0.06

Page 23: Collabrate com2012 rashed

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. JarkeI5-DR-0312-23

Khaled Rashed

Cristina Balasoiu

Ralf Klamma

Deutschen Akademischen Austauschdienstes

CollaborateCom2012

Conclusions and Future Work Conclusions

– Proposed an expertise ranking algorithm in collaborative systems

(fake multimedia detection systems)

– Leveraging trust and showed the trust implications

– Combination of expert ranking and resistant to Sybils algorithms Future Work

- Applying the algorithm on real data and on different data sets

– Temporal analysis –time series analysis

– Integrate the domain knowledge driven method