yingtongdou(uic) guixiangma(intellabs) philips.yu(uic ...sxie/paper/kdd20slides.pdf[1]m. luca. 2016....
Post on 08-Mar-2021
1 Views
Preview:
TRANSCRIPT
Robust Spammer Detection by Nash Reinforcement Learning
Yingtong Dou (UIC) Guixiang Ma (Intel Labs)
ACM SIGKDD’ 20, August 23-27th, Virtual Event, CA, USA
ydou5@uic.edu
Paper: http://arxiv.org/abs/2006.06069Slides: http://ytongdou.com/files/kdd20slides.pdfCode: https://github.com/YingtongDou/Nash-Detect
Sihong Xie (Lehigh)Philip S. Yu (UIC)
1
Outline• Background: review spam and spamming campaign
• Highlight: previous works vs. our works
• Methodology I: practical goals of spammers and defenders
• Methodology II: robust training of spam detectors (Nash-Detect)
• Experiments: the training and deployment performance of Nash-Detect
• Conclusion & Future Works
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
2
Images from https://upserve.com/restaurant-insider/five-key-reasons-shouldnt-buy-yelp-reviews/http://greyenlightenment.com/detecting-fake-amazon-reviews/[1] J. Swearingen. 2017. Amazon Is Filled With Sketchy
Reviews. Here’s How to Spot Them. https://slct.al/2TBXDpT
Fake Reviews are PrevalentBackground
• Near 40% reviews in Amazon are fake[1]
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
• Yelp hide suspicious reviews and alert consumers
3
• Dishonest merchants can easily buy high-quality fakereviews online
Images from https://mopeak.com/buy-android-reviews/http://faculty.cs.tamu.edu/caverlee/pubs/kaghazgaran19cikm.pdf
[1] P. Kaghazgaran, M. Alfifi, and J. Caverlee. 2019. Wide-Ranging Review Manipulation Attacks: Model, Empirical Study, and Countermeasures. In CIKM.
Spamming CampaignBackground
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
• Machine-generated fake reviews are very authentic-like[1]
4
Review Spam Detection• To detect fake reviews, three major types of spam detectors
have been proposed
Text-based Detectors Behavior-based Detectors Graph-based Detectors
Background
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
5
Base Spam Detectors
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Challenges Methodology I Methodology II Experiments
MRF-based detector
SVD-based detector
Dense-block-based detector
Behavior-based detector
• GANG• SpEagle
• fBox
• Fraudar
• Prior
6
Previous Works vs. Our WorkBackground Highlight
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
• Our work:• Dynamic game between spammer and defender• Practical evaluation metric• Evolving spamming strategies• Multiple detectors ensemble
• Previous works:• Static dataset• Accuracy-based evaluation metric• Fixed spamming pattern• Single detector
8
Turning Reviews into Business Revenues• In Yelp, product’s rating is correlated to its revenue[1]
[1] M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016).
Background Methodology I
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Revenue Estimation& Practical Effect
Highlight
:
• We run five detectors individually against five attacks
• When detector recalls are high (>0.7), the practical effects arenot reduced
10
Practical Effect is Better than Recall
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Challenges Methodology I Methodology II Experiments
0.0 0.2 0.4 0.6 0.8 1.05eFall
0
1
2
3
4
5
3ra
FtiF
al Eff
eFt
YelSChi
6SEagleFrauGarGA1G3riRrfBRx
0.0 0.2 0.4 0.6 0.8 1.05eFall
0
5
10
15
20
25
PraF
tiFal
Effe
Ft
YelS1YC
6SEagleFrauGarGA1GPriRrfBRx
0.0 0.2 0.4 0.6 0.8 1.0ReFall
0
20
40
60
80
100
PraF
tiFal
Effe
Ft
YelSZiS
6SEagleFrauGarGA1GPriRrfBRx
11
Spammer’s Practical Goal
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Methodology I
Spammer’s Goal:
• To promote a product, the practical goal of the spammer is tomaximize the PE.
Highlight
Revenue after attacks
SpammingPractical Effect
Revenue before attacks
Spamming strategy weights
:
12
Defender’s Practical Goal
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Methodology I
Defender’s Goal:
• We combine detector prediction results with the practicaleffect to formulate a cost-sensitive loss
• The defender needs to minimize the practical effect
Highlight
The cost of false negatives
The prediction results of detectorsDetector weights
13
A Minimax-Game Formulation
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Methodology I Methodology II
• The objective function is not differentiable
• Our solution: multi-agent non-cooperative reinforcement learning and SGD optimization
Highlight
Minimax Game Objective:
16
Train a Robust Detector -
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Methodology I Methodology IIHighlight
Nash-Detect
17
Base Spamming Strategies
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Challenges Methodology I Methodology II Experiments
• IncBP: add reviews with minimum suspiciousness based onbelief propagation on MRF
• IncDS: add reviews with minimum densities on graphcomposed of accounts, reviews, and products
• IncPR: add reviews with minimum prior suspicious scorescomputed by behavior features
• Random: randomly add reviews
• Singleton: add reviews with new accounts
• Dataset statistics and spamming attack settings
18
Experimental Settings
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Challenges Methodology I Methodology II Experiments
Dataset # Accounts # Products # Reviews # Controlledelite accounts
# Targetproducts
# Postedfake reviews
YelpChi 38063 201 67395 100 30 450
YelpNYC 160225 923 359052 400 120 1800
YelpZip 260277 5044 608598 700 600 9000
• The spammer controls elite and new accounts
• The defender removes top k suspicious reviews
19
Fixed Detector’s Vulnerability
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Challenges Methodology I Methodology II Experiments
InFB3 InFDS InF35 5DndRP SingletRnSSDPPing StrDtegy
0
1
2
3
4
5
3rDF
tiFDl
EIIe
Ft
FrDudDr
• For a fixed detector (Fraudar), the spammer can switch to thespamming strategy with the max practical effect (IncDS)
20
Nash-Detect Training Process
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Challenges Methodology I Methodology II Experiments
0 10 20 30 40 50ESisRde
0.0
0.1
0.2
0.3
0.4
AttD
cN 0
ixtr
ue 3
DrDP
eter
YelSChiIncB3IncDSInc35
5DndRPSingletRn
0 10 20 30 40 50ESisRde
YelS1YC
0 10 20 30 40 50ESisRde
YelSZiS
• Singleton attack is less effective than other four attacks.
21
Nash-Detect Training Process
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Challenges Methodology I Methodology II Experiments
0 10 20 30 40 50ESisoGe
0.0
0.5
1.0
1.5
2.0
2.5
Det
eFto
r IP
Sort
DnFe
YelSChiSSEDgleFrDuGDrGA1G
3riorIBox
0 10 20 30 40 50ESisoGe
YelS1YC
0 10 20 30 40 50ESisoGe
YelSZiS
• Nash-Detect can find the optimal detector importance smoothly
0 10 20 30 40 50ESisoGe
3.25
3.50
3.75
4.00
4.25
4.50
4.75
5.00
3rDF
tiFDl
Effe
Ft
YelSChi
fBox6SEDgleFrDuGDr
3riorGA1G1Dsh-DeteFt
0 10 20 30 40 50ESisoGe
21
22
23
24
25
26
27
28YelS1YC
0 10 20 30 40 50ESisoGe
95
100
105
110YelSZiS
22
Nash-Detect Training Process
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Challenges Methodology I Methodology II Experiments
• The practical effect of detectors configured by Nash-Detect arealways less than the worst-case performances
23
Nash-Detect Performance in Deployment
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
Background Challenges Methodology I Methodology II Experiments
3.0
3.4
3.8
4.2
4.6
YelSChi
21.0
21.8
22.6
23.4
24.2
YelS1YC
EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr
72.5
80.5
88.5
YelSZiS
3.0
3.4
3.8
4.2
4.6
YelSChi
21.0
21.8
22.6
23.4
24.2
YelS1YC
EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr
72.5
80.5
88.5
YelSZiS
3.0
3.4
3.8
4.2
4.6
YelSChi
21.0
21.8
22.6
23.4
24.2
YelS1YC
EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr
72.5
80.5
88.5
YelSZiS
3.0
3.4
3.8
4.2
4.6
YelSChi
21.0
21.8
22.6
23.4
24.2
YelS1YC
EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr
72.5
80.5
88.5
YelSZiS
24
Key TakeawaysConclusion
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
• New metric
• New spamming strategies
• New adversarial training algorithm
Background Challenges Methodology I Methodology II Experiments
25
Future Works
Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020
• Investigate the attack and defenses of deep learning spamdetection methods
• Apply the Nash-Detect framework on other review systems andapplications
• Develop advanced attack generation techniques aware of thestates of review system
ConclusionBackground Challenges Methodology I Methodology II Experiments
Robust Spammer Detection by Nash Reinforcement Learning
Yingtong Dou (UIC) Guixiang Ma (Intel Labs)
ACM SIGKDD’ 20, August 23-27th, Virtual Event, CA, USA
ydou5@uic.edu
Paper: http://arxiv.org/abs/2006.06069Slides: http://ytongdou.com/files/kdd20slides.pdfCode: https://github.com/YingtongDou/Nash-Detect
Sihong Xie (Lehigh)Philip S. Yu (UIC)
top related