![Page 1: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/1.jpg)
Securing Web Service by Automatic Robot Detection
KyoungSoo Park, Vivek S. PaiPrinceton University
Kang-Won Lee, Seraphin CaloIBM T.J. Watson Research
Center
![Page 2: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/2.jpg)
KyoungSoo Park USENIX 2006 2
Web Robots
• Automatic agents• Web crawlers• URL link checkers
• Malicious robots are widespread• Password cracking• Referrer/Blog spamming• Click frauds on Google search• Burning CPU with heavy CGI queries
![Page 3: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/3.jpg)
KyoungSoo Park USENIX 2006 3
Contributions
• Real-time robot detector• Fast detection
• 80% at 20 reqs, 95% at 57 reqs• High accuracy
• 2.4% max false positive rate• Low overhead
• ~200 usec additional delay per page
• Easy deployment
![Page 4: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/4.jpg)
KyoungSoo Park USENIX 2006 4
Operational Scenario
• Server-side• Site Webserver• Many-to-one
• Client-side• Firewall/Proxies at LAN
• Many-to-many
MON ServersClients
Server infrastructure
Client infrastructure
![Page 5: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/5.jpg)
KyoungSoo Park USENIX 2006 5
Design Goals
• Transparency• No human intervention
• Accuracy• Minimal false positives
• Real-time proof• Periodic check should be possible• Authentication or CAPTCHA not enough
• Practicality
![Page 6: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/6.jpg)
KyoungSoo Park USENIX 2006 6
Observation & Intuition
Robot behavior• Custom program• Goal-oriented
• No embedded objs
• No index file• Follow hidden links
• No HW events
Human behavior• Standard browsers• Browsing purpose
• Cascading style sheets
• Images• Never follow hidden links
• Mouse & keyboardHumans are easier to detect
![Page 7: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/7.jpg)
KyoungSoo Park USENIX 2006 7
Browser Detection
• “No standard browser”(implies) robot
• “User-Agent” HTTP header? • Use behavioral artifacts (dynamic mods)• Redundant embedded objects
•Empty cascading style sheet (CSS)•Invisible images (1x1 JPEG) or mute sounds
• Hidden links
![Page 8: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/8.jpg)
KyoungSoo Park USENIX 2006 8
Human Activity Detection
• Human activities (implies) human
• Mouse/keyboard event tracking• Most robots don’t generate HW events
• Dynamically embed JavaScript code• MouseMove triggers the event handler• Event handler fetches a fake image • Semantically & lexically obfuscated
![Page 9: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/9.jpg)
KyoungSoo Park USENIX 2006 9
Test with CoDeeN
• CoDeeN (http://codeen.cs.princeton.edu/)• Pulling-based CDN on PlanetLab over 3 years
• 25+ million reqs from 50K clients/day
• Malicious robots seeking abuse
• Results for 1-week measurement• But changes now permanent
![Page 10: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/10.jpg)
KyoungSoo Park USENIX 2006 10
Main Result
Robots71.1%
CSS Fetch28.9%
JavaScript Exec27.1%
MouseMove22.3%
Not sure, but humanPotential FP, 1.9%
JS but No MouseMoveRobots
![Page 11: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/11.jpg)
KyoungSoo Park USENIX 2006 11
Main Result
Robots71.1%
CSS Fetch28.9%
Max False Positive Rate= FP/negatives= /Robots = 1.9/77.7 = 2.4%
Only 9% passed (optional) CAPTCHA
Only 0.9% followed hidden links
![Page 12: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/12.jpg)
KyoungSoo Park USENIX 2006 12
How Fast Can We Detect?
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 21 41 61 81Number of Requests Required to Detect
Fraction of sessions detected in X
reqs (CDF)CSS file
JavaScript files
Mouse events
80% 20 reqs 95% 57 reqs
![Page 13: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/13.jpg)
KyoungSoo Park USENIX 2006 13
# of CoDeeN Complaints
0
1
2
3
4
5
6
7
8
9
10
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May
Months in 2005/2006
# of CoDeeN Abuse
Complaints
Browser Detection
Human Activity Detection
![Page 14: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/14.jpg)
KyoungSoo Park USENIX 2006 14
Limitations
• Defeating browser detection• Behave exactly like a standard browser
• Human activity detection• Robots generating mouse/key events• Disable JavaScript – 4%
• Solution• Ensemble techniques
![Page 15: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/15.jpg)
KyoungSoo Park USENIX 2006 15
Machine Learning (AdaBoost)
88
89
90
91
92
93
94
95
96
20 40 60 80 100 120 140 160
# of Requests
Accuracy(%)
Train setTest set
Three most effective attributes1. RESPONSE CODE 300%2. REFERRER %3. UNSEEN REFERRER %
Drawbacks:1. Heavy
computation/memory2. Pattern may change3. Human intervention
![Page 16: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research](https://reader035.vdocuments.site/reader035/viewer/2022081908/56649e9e5503460f94ba091c/html5/thumbnails/16.jpg)
KyoungSoo Park USENIX 2006 16
Conclusions
• Practical robot detection tool• Detect human by
• Standard browser behavior• Human activities
• “Arms Race” in the end• Turing test• Most simple bots screened out
• Ensemble techniques promising