securing web service by automatic robot detection kyoungsoo park, vivek s. pai princeton university...

Post on 12-Jan-2016

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Securing Web Service by Automatic Robot Detection

KyoungSoo Park, Vivek S. PaiPrinceton University

Kang-Won Lee, Seraphin CaloIBM T.J. Watson Research

Center

KyoungSoo Park USENIX 2006 2

Web Robots

• Automatic agents• Web crawlers• URL link checkers

• Malicious robots are widespread• Password cracking• Referrer/Blog spamming• Click frauds on Google search• Burning CPU with heavy CGI queries

KyoungSoo Park USENIX 2006 3

Contributions

• Real-time robot detector• Fast detection

• 80% at 20 reqs, 95% at 57 reqs• High accuracy

• 2.4% max false positive rate• Low overhead

• ~200 usec additional delay per page

• Easy deployment

KyoungSoo Park USENIX 2006 4

Operational Scenario

• Server-side• Site Webserver• Many-to-one

• Client-side• Firewall/Proxies at LAN

• Many-to-many

MON ServersClients

Server infrastructure

Client infrastructure

KyoungSoo Park USENIX 2006 5

Design Goals

• Transparency• No human intervention

• Accuracy• Minimal false positives

• Real-time proof• Periodic check should be possible• Authentication or CAPTCHA not enough

• Practicality

KyoungSoo Park USENIX 2006 6

Observation & Intuition

Robot behavior• Custom program• Goal-oriented

• No embedded objs

• No index file• Follow hidden links

• No HW events

Human behavior• Standard browsers• Browsing purpose

• Cascading style sheets

• Images• Never follow hidden links

• Mouse & keyboardHumans are easier to detect

KyoungSoo Park USENIX 2006 7

Browser Detection

• “No standard browser”(implies) robot

• “User-Agent” HTTP header? • Use behavioral artifacts (dynamic mods)• Redundant embedded objects

•Empty cascading style sheet (CSS)•Invisible images (1x1 JPEG) or mute sounds

• Hidden links

KyoungSoo Park USENIX 2006 8

Human Activity Detection

• Human activities (implies) human

• Mouse/keyboard event tracking• Most robots don’t generate HW events

• Dynamically embed JavaScript code• MouseMove triggers the event handler• Event handler fetches a fake image • Semantically & lexically obfuscated

KyoungSoo Park USENIX 2006 9

Test with CoDeeN

• CoDeeN (http://codeen.cs.princeton.edu/)• Pulling-based CDN on PlanetLab over 3 years

• 25+ million reqs from 50K clients/day

• Malicious robots seeking abuse

• Results for 1-week measurement• But changes now permanent

KyoungSoo Park USENIX 2006 10

Main Result

Robots71.1%

CSS Fetch28.9%

JavaScript Exec27.1%

MouseMove22.3%

Not sure, but humanPotential FP, 1.9%

JS but No MouseMoveRobots

KyoungSoo Park USENIX 2006 11

Main Result

Robots71.1%

CSS Fetch28.9%

Max False Positive Rate= FP/negatives= /Robots = 1.9/77.7 = 2.4%

Only 9% passed (optional) CAPTCHA

Only 0.9% followed hidden links

KyoungSoo Park USENIX 2006 12

How Fast Can We Detect?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 21 41 61 81Number of Requests Required to Detect

Fraction of sessions detected in X

reqs (CDF)CSS file

JavaScript files

Mouse events

80% 20 reqs 95% 57 reqs

KyoungSoo Park USENIX 2006 13

# of CoDeeN Complaints

0

1

2

3

4

5

6

7

8

9

10

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May

Months in 2005/2006

# of CoDeeN Abuse

Complaints

Browser Detection

Human Activity Detection

KyoungSoo Park USENIX 2006 14

Limitations

• Defeating browser detection• Behave exactly like a standard browser

• Human activity detection• Robots generating mouse/key events• Disable JavaScript – 4%

• Solution• Ensemble techniques

KyoungSoo Park USENIX 2006 15

Machine Learning (AdaBoost)

88

89

90

91

92

93

94

95

96

20 40 60 80 100 120 140 160

# of Requests

Accuracy(%)

Train setTest set

Three most effective attributes1. RESPONSE CODE 300%2. REFERRER %3. UNSEEN REFERRER %

Drawbacks:1. Heavy

computation/memory2. Pattern may change3. Human intervention

KyoungSoo Park USENIX 2006 16

Conclusions

• Practical robot detection tool• Detect human by

• Standard browser behavior• Human activities

• “Arms Race” in the end• Turing test• Most simple bots screened out

• Ensemble techniques promising

top related