exploring linkability of user reviews mishari almishari and gene tsudik university of california,...

Exploring Linkability of User Reviews

Mishari Almishari and Gene Tsudik

University of California, Irvine

Roadmap

1. Introduction2. Data Set & Problem Settings3. Linkability Results &

Improvements4. Discussion5. Future Work & Conclusion

Motivation

Increasing Popularity of Reviewing Sites

Yelp, more than 39M visitors and 15M reviews in 2010

Example

category

Rating

Motivation

Rising awareness of privacy

Motivation

How is it applied?

Traceability/Linkability

Linkability of Ad hoc Reviews

Linkablility of Several Accounts

Assess the linkability in user reviews

Roadmap

Data Set

• 1 Million Reviews • 2000 Users• more than 300 reviews

Problem Settings

IR: Identified RecordIR

AR: Anonymous Record

Problem Formulation

Anonymous Record (AR)

Identified Records (IR’s)

Matching Model

TOP-X LinkabilityX: 1 and 10

1, 5, 10, 20,…60

Problem Settings

Methodologies(1) Naïve Bayesian Model

(2) Kullback-Leibler Divergence (KLD)

Decreasing Sorted List of IRs

Increasing Sorted List of IRs

Maximum-Likelihood Estimation

Tokens

• Unigram:• “privacy”: “p”, “r”, “i”, “v”, “a”, “c”, “y”• 26 values

• Digram• “privacy”: “pr”, “ri”, “iv”, “va”, “ac”, “cy”• 676 values

• Rating• 5 values

• Category• 28 values

Roadmap

NB -Unigram

Unigram Results

Anonymous Record Size

Size 60, LR 83%/ Top-1LR 96% Top-10

Digram Results

NB -Digram

Size 20, LR 97%/

Top-1Size10, LR 88%/

Improvement (1): Combining Lexical and non-Lexical

onesNB Model

Gain, up to 20%

Size 60, 83 % To

Size 30, 60 % To

What about Restricting Identified Record (IR) Size?

NB Model KLD Model

oAnonymous Record

Affected by IR size

Performed better for smaller IR

Size 20 or less, improved

✖✖

v1 v3v2 v4

v7v5 v6 v8

v9 v10

v15 v1

Improvement (2): Matching All IR’s At Once

Matching All Results

Restricted IR Full IR

Gain, up to 16%

Size 30, From 74% To 90%

Gain, up to 23%Size 20, From 35% To 55%

Improvement (3): For Small IR Size

Changing it to:0.5 + Review Length

o Size 10, 89% To 92%

Size 7, 79% To 84%

Gain up to 5%

Roadmap

Discussion

o Unigram and Scalabilityo 26 VS 676o 59 VS 676o Less than 10%

o Prolific Userso On the long run, will be prolific

o Anonymous Record Size o A set of 60 reviews, less than 20% of minimum

contribution o Detecting Spam Reviews

Roadmap

Future Work

o Improving more for Small AR’so Other Probabilistic Modelso Using Stylometry

o Review Anonymizationo Exploring Linkability in other Preference

Databases

Conclusion

o Extensive Study to Assess Linkability of User Reviewso For large set of userso Using very simple features

o Users are very exposed even with simple features and large number of authors

Reviews can be accurately de-anonymized using alphabetical letter distributions

Takeaway Point:

Questions?

exploring linkability of user reviews mishari almishari and gene tsudik university of california,...

prolificanonymous record

identified record ir

problem settingsmethodologies

small ir size

ir sizeperformed

unigram alpha value

digram alpha value

list of irsincreasing

Documents

linkedon: concerns about data linkability mireille...

fighting authorship linkability with crowdsourcing · 2014....

1 availability modeling of cooling water pumps to assess if...

university of the pacific 1 parabolic food aid delivery...

conference...

how much anonymity does network latency leak? · how much...

deanonymization and linkability of cryptocurrency...

link setup time (ms) details : how do sender and receiver...

user identities across social networks: quantifying...

typo-squatting: a nuisance or a threat to your traffic?...

love of allah : experience the beauty of salah -mishari...

annual report 2019 - al ahli bank of kuwait...

proposed document: tools for assessing the usability of...

exploring linkability of user reviews

revealing the linkability of popescu id-based group...

exploring linkability of user reviews mishari almishari and...

34427 acolla distrito de oxapampa 34686 villareal 34459...

vice president of external relations and advancement · hrh...

aura cordia - aura light usa · aura cordia linear led...

deanonymization and linkability of cryptocurrency ... · i...