exploring linkability of user reviews mishari almishari and gene tsudik university of california,...

29
Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Upload: amanda-copeland

Post on 29-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Exploring Linkability of User Reviews

Mishari Almishari and Gene Tsudik

University of California, Irvine

Page 2: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Roadmap

1. Introduction2. Data Set & Problem Settings3. Linkability Results &

Improvements4. Discussion5. Future Work & Conclusion

Page 3: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Motivation

Increasing Popularity of Reviewing Sites

Yelp, more than 39M visitors and 15M reviews in 2010

Page 4: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Example

category

Rating

Page 5: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Motivation

Rising awareness of privacy

Page 6: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Motivation

How is it applied?

Traceability/Linkability

Linkability of Ad hoc Reviews

Linkablility of Several Accounts

Page 7: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Goal

Assess the linkability in user reviews

Page 8: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Roadmap

1. Introduction2. Data Set & Problem Settings3. Linkability Results &

Improvements4. Discussion5. Future Work & Conclusion

Page 9: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Data Set

• 1 Million Reviews • 2000 Users• more than 300 reviews

Page 10: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Problem Settings

Page 11: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Problem Settings

Page 12: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

IR: Identified RecordIR

IR

IR

IR

AR

AR

AR

AR

AR: Anonymous Record

Problem Formulation

Page 13: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Anonymous Record (AR)

Identified Records (IR’s)

Matching Model

TOP-X LinkabilityX: 1 and 10

1, 5, 10, 20,…60

Problem Settings

Page 14: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Methodologies(1) Naïve Bayesian Model

(2) Kullback-Leibler Divergence (KLD)

Decreasing Sorted List of IRs

Increasing Sorted List of IRs

Maximum-Likelihood Estimation

Page 15: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Tokens

• Unigram:• “privacy”: “p”, “r”, “i”, “v”, “a”, “c”, “y”• 26 values

• Digram• “privacy”: “pr”, “ri”, “iv”, “va”, “ac”, “cy”• 676 values

• Rating• 5 values

• Category• 28 values

Page 16: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Roadmap

1. Introduction2. Data Set & Problem Settings3. Linkability Results &

Improvements4. Discussion5. Future Work & Conclusion

Page 17: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

NB -Unigram

Unigram Results

Anonymous Record Size

Lin

kab

ilit

y R

ati

o

Size 60, LR 83%/ Top-1LR 96% Top-10

Page 18: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Digram Results

NB -Digram

Lin

kab

ilit

y

Rati

o

Anonymous Record Size

Size 20, LR 97%/

Top-1Size10, LR 88%/

Top-1

Page 19: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Improvement (1): Combining Lexical and non-Lexical

onesNB Model

Anonymous Record Size

Lin

kab

ilit

y

Rati

o

Gain, up to 20%

Size 60, 83 % To

96%

Size 30, 60 % To

80%

Page 20: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

What about Restricting Identified Record (IR) Size?

NB Model KLD Model

Anonymous Record Size

Lin

kab

ilit

y R

ati

oAnonymous Record

Size

Lin

kab

ilit

y R

ati

o

Affected by IR size

Performed better for smaller IR

Size 20 or less, improved

Page 21: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

✖✖

v1 v3v2 v4

v7v5 v6 v8

v9 v10

v11

v12

v13

v14

v15 v1

6

Improvement (2): Matching All IR’s At Once

Page 22: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Matching All Results

Restricted IR Full IR

Anonymous Record Size

Lin

kab

ilit

y R

ati

o

Anonymous Record Size

Lin

kab

ilit

y R

ati

o

Gain, up to 16%

Size 30, From 74% To 90%

Gain, up to 23%Size 20, From 35% To 55%

Page 23: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Improvement (3): For Small IR Size

Changing it to:0.5 + Review Length

Anonymous Record Size

Lin

kab

ilit

y

Rati

o Size 10, 89% To 92%

Size 7, 79% To 84%

Gain up to 5%

Page 24: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Roadmap

1. Introduction2. Data Set & Problem Settings3. Linkability Results &

Improvements4. Discussion5. Future Work & Conclusion

Page 25: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Discussion

o Unigram and Scalabilityo 26 VS 676o 59 VS 676o Less than 10%

o Prolific Userso On the long run, will be prolific

o Anonymous Record Size o A set of 60 reviews, less than 20% of minimum

contribution o Detecting Spam Reviews

Page 26: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Roadmap

1. Introduction2. Data Set & Problem Settings3. Linkability Results &

Improvements4. Discussion5. Future Work & Conclusion

Page 27: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Future Work

o Improving more for Small AR’so Other Probabilistic Modelso Using Stylometry

o Review Anonymizationo Exploring Linkability in other Preference

Databases

Page 28: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Conclusion

o Extensive Study to Assess Linkability of User Reviewso For large set of userso Using very simple features

o Users are very exposed even with simple features and large number of authors

Reviews can be accurately de-anonymized using alphabetical letter distributions

Takeaway Point:

Page 29: Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Questions?