juice: a longitudinal study of an seo campaign david y. wang, stefan savage, and geoffrey m. voelker...

42
Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

Upload: martin-lampton

Post on 01-Apr-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

1

Juice: A Longitudinal Study of an SEO Campaign

David Y. Wang, Stefan Savage, and Geoffrey M. VoelkerUniversity of California, San Diego

Page 2: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

2

Background

• A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive means– Supported by botnet of compromised Web Sites– Poison search results– Feed traffic to scams (e.g. Fake Anti-Virus)

• Link Juice refers to the backlinks (references) a site receives – Believed to influence search result ranking

Page 3: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

3

Doorway

Attacker

We begin with an attacker+ a targeted Website

Page 4: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

4

Doorway

(1)

Attacker

The attacker compromises the Website using an openvulnerability + installs anSEO kit

Page 5: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

5

Doorway

(1)

(2)

GET /volcano.html

Search EngineWeb Crawler

Attacker

When a Web crawler tries tofetch a page…

Page 6: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

6

Doorway

(1)

(2)

GET /volcano.html

Search EngineWeb Crawler

Attacker

The crawler receives a page intended to rank well

Page 7: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

7

Doorway

(1)

(2)

(3)

GET /volcano.html

Search EngineWeb Crawler

Attacker

The page gets indexedby Google

Page 8: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

8

Doorway

(1)

(2)

(3)

GET /volcano.html

(4)

User

Search EngineWeb Crawler

Attacker

“volcano”

When a user searchesin Google + clicks onthe compromised page…

Page 9: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

9

Doorway

(1)

(2)

(3)

GET /volcano.html

(4)

(5)

User

Search EngineWeb Crawler

Attacker

Scams

“volcano”

He is redirected to a scam of the attacker’schoosing…

Page 10: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

10

Our Contributions

• Infiltrate an influential SEO botnet (GR)– In depth characterization of GR’s operation

• One time leader in poisoned search results on Google

– Our work builds on previous work studying search result poisoning [John11, Lu11, Moore11]

• Draw insights from combining data from three separate data sources (crawlers):– Estimate GR’s effectiveness– Examine impact of scams funding GR

Page 11: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

11

SEO Kit

• An SEO kit is software installed on compromised sites– Allows backdoor access for botmaster– Performs Black Hat SEO (i.e. cloaking, content

generation, user redirection)– Typically they are obfuscated code snippets

injected into pages<?phpif(!function_exists('cm4y2wui5w153')) {

function cm4y2wui5w153($smcx){$dix5xk='x);';…}?>

<?php// Общееdefine("GR_CACHE_ID", "v8_cache");define("GR_SCRIPT_VERSION", "v8.0 (28.02.2012)");?>

Page 12: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

12

Anecdote

• Obtained a copy of the GR SEO kit by contacting owners of compromised sites– Roughly 40 attempts– A handful were willing to help– But, only 1 person was able to disinfect their site

and send us the kit• The SEO kit allows us to infiltrate the botnet

and understand how the campaign works

Page 13: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

13

GR Botnet Architecture

• The GR Botnet is built using pull mechanisms and is comprised of 3 types of hosts:– Compromised Web Sites act as doorways for

visitors and control which content is returned– The Directory Server’s only role is to return the

location of the C&C Server– The C&C Server acts as a centralized content

server for the GR Botmaster

Page 14: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

14

Compromised Web Sites

Directory Server

C&CServer

User requests a page from a compromised site

Example of User Visit

HTTP GET index.html

Page 15: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

15

Compromised Web Sites

Directory Server

C&CServer

Example of User Visit

Compromised site tries to look up location of C&C Where is the

C&C?

Page 16: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

16

Compromised Web Sites

Directory Server

C&CServer

Example of User Visit

Compromised site looks up location of C&C Server The C&C is @

1.2.3.4

Page 17: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

17

Compromised Web Sites

Directory Server

C&CServer

Example of User VisitCompromised site fetches content to return to user from C&C Server

What should I return to the

user?

Page 18: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

18

Compromised Web Sites

Directory Server

C&CServer

Example of User VisitCompromised site fetches content to return to user from C&C Server

Here are some scams for the

user

Page 19: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

19

Compromised Web Sites

Directory Server

C&CServer

Example of User Visit

User is redirected to scams

Page 20: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

20

Data Collection

• We collect data using 3 distinct crawlers– Odwalla crawls and monitors compromised sites

in the GR botnet (October 2011 – June 2012)– Dagger measures poisoned search results for

trending searches (April 2011 – August 2011)– Trajectory crawls pages using a Web browser to

follow redirects (April 2011 – August 2011)• Although timeframes do not overlap cleanly,

we can still draw insights

Page 21: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

21

Odwalla

• Odwalla crawls GR’s topology• Begin w/ poisoned search results [Dagger]• Takes advantage of two characteristics of the

compromised sites in GR:– Sites respond to the C&C protocol by returning

diagnostic information (easy confirmation)– Sites are cross linked with other compromised

sites in order to manipulate search rankings (find more compromised sites)

Page 22: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

22

Results

• What are the characteristics of GR?– Size, Churn, Lifetime

• How effective is GR in poisoning Google? – We focus on how many poisoned search results

are exposed to the user• Longitudinal data allows us to identify long

term trends– Monetization through scams

Page 23: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

23

GR Size + Churn

• GR is modest in size• There is little churn amongst nodes

Page 24: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

24

• We define lifetime as the time between the first and last time Odwalla observed the SEO kit running on a site

• A site is sanitized when it no longer responds to the C&C protocol for 8 consecutive days

GR Lifetime

Page 25: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

25

• Compromised sites are long lived (months at a time) and able to support GR w/ high availability

• SEO kits want to hide their presence from site owners

GR Lifetime

Page 26: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

26

Effectiveness

• Measure effectiveness of GR by the volume of poisoned search results

• Intersect known compromised sites [Odwalla] with poisoned search results on Google [Dagger]

• Label each poisoned search result as:– Active: cloaking + redirecting users – Tagged: neutralized via Google Safe Browsing– Dormant: cloaking, but not redirecting users

Page 27: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

27

Effectiveness

• Multiple periods of activity:Start Surge Steady Idle

Page 28: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

28

Effectiveness

Start Surge Steady IdleMostly tagged, active ramping up

Page 29: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

29

Effectiveness

Start Surge Steady IdleActive surges with little pressure from GSB

Page 30: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

30

Effectiveness

Start Surge Steady IdleTagged increases, but many active still present

Page 31: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

31

Effectiveness

Start Surge Steady IdleTotal volume drops, lack of monetization

Page 32: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

32

Market Share

• Compare GR against all poisoned search results• GR accounts for the majority of poisoned search

results during the surge period (58%)

Page 33: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

33

Monetization

• To identify final scam from redirection data [Trajectory], we select chains:– Originate from GR doorway– Contain 1+ cross site redirect– Occur while mimicking MSIE

• Manually cluster + classify scams

Page 34: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

34

Monetization

• Experimentation w/ affiliate programs• Early on Fake AV is the scam of choice

Page 35: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

35

Monetization

• FBI crackdown on Fake AV industry sent GR into flux

Page 36: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

36

Conclusion

• GR is very effective at poisoning search results even with modest resources

• Fake AV was the financial motivation that drove innovation in GR (the killer scam)

• Pure technical interventions had some effect, but it was the financial intervention that forced GR into retirement

Page 37: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

37

Thank You!

• Questions?

Page 38: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

38

Odwalla Example

Site_0

Site_1

Site_2

Super Bowl

Beyonce

Super Bowl

Odwalla wants to test whether Site_0 is part of GR

Page 39: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

39

Odwalla Example

Site_0

Site_1

Site_2

Super Bowl

Beyonce

Super Bowl

Odwalla uses C&C protocol to initiate handshake w/ Site_0

Page 40: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

40

Odwalla Example

Version: v MAC 1 (28.10.2011)Cache ID: v7mac_cacheHost ID: example.com

Site_0

Site_1

Site_2

Super Bowl

Beyonce

Super Bowl

Site_0 responds w/ diagnostic info, confirming membership in GR

Page 41: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

41

Odwalla Example

Site_0

Site_1

Site_2

Super Bowl

Beyonce

Super Bowl

In addition we discover Site_0 juicing Site_1 and Site_2

Page 42: Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

42

Odwalla Example

Site_0

Site_1

Site_2

Super Bowl

Beyonce

Super Bowl

Odwalla tests Site_1 and Site_2