securing web service by automatic robot detection kyoungsoo park, vivek s. pai princeton university...

16
Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research Center

Upload: emory-allison

Post on 12-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

Securing Web Service by Automatic Robot Detection

KyoungSoo Park, Vivek S. PaiPrinceton University

Kang-Won Lee, Seraphin CaloIBM T.J. Watson Research

Center

Page 2: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 2

Web Robots

• Automatic agents• Web crawlers• URL link checkers

• Malicious robots are widespread• Password cracking• Referrer/Blog spamming• Click frauds on Google search• Burning CPU with heavy CGI queries

Page 3: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 3

Contributions

• Real-time robot detector• Fast detection

• 80% at 20 reqs, 95% at 57 reqs• High accuracy

• 2.4% max false positive rate• Low overhead

• ~200 usec additional delay per page

• Easy deployment

Page 4: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 4

Operational Scenario

• Server-side• Site Webserver• Many-to-one

• Client-side• Firewall/Proxies at LAN

• Many-to-many

MON ServersClients

Server infrastructure

Client infrastructure

Page 5: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 5

Design Goals

• Transparency• No human intervention

• Accuracy• Minimal false positives

• Real-time proof• Periodic check should be possible• Authentication or CAPTCHA not enough

• Practicality

Page 6: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 6

Observation & Intuition

Robot behavior• Custom program• Goal-oriented

• No embedded objs

• No index file• Follow hidden links

• No HW events

Human behavior• Standard browsers• Browsing purpose

• Cascading style sheets

• Images• Never follow hidden links

• Mouse & keyboardHumans are easier to detect

Page 7: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 7

Browser Detection

• “No standard browser”(implies) robot

• “User-Agent” HTTP header? • Use behavioral artifacts (dynamic mods)• Redundant embedded objects

•Empty cascading style sheet (CSS)•Invisible images (1x1 JPEG) or mute sounds

• Hidden links

Page 8: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 8

Human Activity Detection

• Human activities (implies) human

• Mouse/keyboard event tracking• Most robots don’t generate HW events

• Dynamically embed JavaScript code• MouseMove triggers the event handler• Event handler fetches a fake image • Semantically & lexically obfuscated

Page 9: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 9

Test with CoDeeN

• CoDeeN (http://codeen.cs.princeton.edu/)• Pulling-based CDN on PlanetLab over 3 years

• 25+ million reqs from 50K clients/day

• Malicious robots seeking abuse

• Results for 1-week measurement• But changes now permanent

Page 10: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 10

Main Result

Robots71.1%

CSS Fetch28.9%

JavaScript Exec27.1%

MouseMove22.3%

Not sure, but humanPotential FP, 1.9%

JS but No MouseMoveRobots

Page 11: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 11

Main Result

Robots71.1%

CSS Fetch28.9%

Max False Positive Rate= FP/negatives= /Robots = 1.9/77.7 = 2.4%

Only 9% passed (optional) CAPTCHA

Only 0.9% followed hidden links

Page 12: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 12

How Fast Can We Detect?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 21 41 61 81Number of Requests Required to Detect

Fraction of sessions detected in X

reqs (CDF)CSS file

JavaScript files

Mouse events

80% 20 reqs 95% 57 reqs

Page 13: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 13

# of CoDeeN Complaints

0

1

2

3

4

5

6

7

8

9

10

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May

Months in 2005/2006

# of CoDeeN Abuse

Complaints

Browser Detection

Human Activity Detection

Page 14: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 14

Limitations

• Defeating browser detection• Behave exactly like a standard browser

• Human activity detection• Robots generating mouse/key events• Disable JavaScript – 4%

• Solution• Ensemble techniques

Page 15: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 15

Machine Learning (AdaBoost)

88

89

90

91

92

93

94

95

96

20 40 60 80 100 120 140 160

# of Requests

Accuracy(%)

Train setTest set

Three most effective attributes1. RESPONSE CODE 300%2. REFERRER %3. UNSEEN REFERRER %

Drawbacks:1. Heavy

computation/memory2. Pattern may change3. Human intervention

Page 16: Securing Web Service by Automatic Robot Detection KyoungSoo Park, Vivek S. Pai Princeton University Kang-Won Lee, Seraphin Calo IBM T.J. Watson Research

KyoungSoo Park USENIX 2006 16

Conclusions

• Practical robot detection tool• Detect human by

• Standard browser behavior• Human activities

• “Arms Race” in the end• Turing test• Most simple bots screened out

• Ensemble techniques promising