spam and botnets: characterization and mitigation nick feamster anirudh ramachandran david dagon...

35
Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

Upload: audrey-smith

Post on 27-Mar-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

Spam and Botnets:Characterization and Mitigation

Nick Feamster

Anirudh RamachandranDavid DagonGeorgia Tech

Page 2: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

2

Talk Overview

• Network-level behavior of spammers– Ultimate goal: Construct spam filters based on network-level

properties, rather than content – Content-based properties are malleable

• Low cost to evasion: Spammers can easily alter content• High admin cost: Filters must be continually updated

– Content-based filters are applied at the destination• Too little, too late: Wasted network bandwidth, storage, etc.

• Study of DNS-based blacklists

• “Discovery”: One of the most telling network-level properties is botnet membership– DNSBL Counter-Intelligence– Network monitoring

Page 3: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

3

Network-Level Behavior of Spammers: Major Findings

• Where does spam come from?– Most received from few regions of IP address space– Insight about spammier prefixes could improve filters

• Do spammers hijack routes?– A small set of spammers continually advertise short-lived routes– Traceability is not guaranteed

• How is spam sent?– Most coming from Windows hosts (likely, bots)– Identification of spamming groups (e.g., botnets) could help

Page 4: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

4

Data Collection

• Two domains instrumented with MailAvenger (both on same network)– Sinkhole domain #1

• Continuous spam collection since Aug 2004• No real email addresses---sink everything• 10 million+ pieces of spam

• Legitimate mail corpus from a large email provider (40 million inboxes)

• Monitoring BGP route advertisements from same network

Page 5: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

5

Mail Collection: MailAvenger

• Highly configurable SMTP server that collects many useful statistics

Page 6: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

6

BGP Data Collection

MX 1

MX 2

Page 7: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

7

Spam Study: Major Findings

• Where does spam come from?– Most received from few regions of IP address space– Insight about spammier prefixes could improve filters

• Do spammers hijack routes?– A small set of spammers continually advertise short-lived routes– Traceability is not guaranteed

• How is spam sent?– Most coming from Windows hosts (likely, bots)– identification of spamming groups (e.g., botnets) could help

Page 8: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

8

What IP ranges does spam come from?

/24 prefix

Fra

ctio

nSpam comes from a few concentrated regions

of IP address space

Page 9: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

9

Distribution across ASes

• Top two spamming ASes: 10% of received spam• ASes in the US: most spam• Top ASes for legitimate email are different

Top 10 ASes by Spam Count Top 10 ASes by Legit Email Count

Points to note

Page 10: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

10

Spam Study: Major Findings

• Where does spam come from?– Most received from few regions of IP address space– Insight about spammier prefixes could improve filters

• Do spammers hijack routes?– A small set of spammers continually advertise short-lived routes– Traceability is not guaranteed

• How is spam sent?– Most coming from Windows hosts (likely, bots)– Indentification of spamming groups (e.g., botnets) could help

Page 11: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

11

BGP Spectrum Agility

• Log IP addresses of SMTP relays• Join with BGP route advertisements seen at network

where spam trap is co-located.

A small club of persistent players appears to be using

this technique.

Common short-lived prefixes and ASes

61.0.0.0/8 4678 66.0.0.0/8 2156282.0.0.0/8 8717

~ 10 minutes

Somewhere between 1-10% of all spam (some clearly intentional,

others might be flapping)

Page 12: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

12

Why Such Big Prefixes?

• Flexibility: Client IPs can be scattered throughout dark space within a large /8– Same sender usually returns with different IP

addresses

• Visibility: Route typically won’t be filtered (nice and short)

Page 13: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

13

Spam Study: Major Findings

• Where does spam come from?– Most received from few regions of IP address space– Insight about spammier prefixes could improve filters

• Do spammers hijack routes?– A small set of spammers continually advertise short-lived routes– Traceability is not guaranteed

• How is spam sent?– Most coming from Windows hosts (likely, bots)– Identification of spamming groups (e.g., botnets) could help

Page 14: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

14

Characteristics of spamming bots

• Distribution across IP space for bots– Similar to IP space distribution for all spam– Lower bot activity in ranges where spam also comes

from hijacked routes

• Operating Systems of Spamming Hosts– ~ 95% run Windows– The 4% Unix-based hosts send up to 8% spam

Page 15: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

15

Most Bots Send Low Volumes of Spam

Lifetime (seconds)

Am

ou

nt

of

Sp

amMost bot IP addresses send very little spam, regardless

of how long they have been spamming…

99% of bots

Page 16: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

16

Most Bot IP addresses are quiet

65% of bots only send mail to a domain once over 18 months

Blacklists may want to target IP ranges, rather than individual IPs

Lifetime (seconds)

Per

cen

tag

e o

f b

ots

Page 17: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

17

Take-Away Lessons

• Network-level properties are less malleable, and are observable closer to the source of spam

• Aggregate properties (e.g., IP prefix, ASN, route used etc.) may be more effective

• Some network-level properties can be incorporated into spam filters– could be used as a first-pass filter

• Spam filtering requires a better notion of end-host identity

• Securing the Internet routing infrastructure is key to traceabilty

Network-Level Spam Filtering

Redefining End-Host Identifiers

Page 18: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

• DNS-based Blacklists (DNSBLs)– The most prevalent network-level spam filtering

mechanism today– Various criteria: open relays/proxies, virus senders,

bad/unused address spaces etc.– Hundreds of DNSBLs of all sizes

• How to measure the effectiveness of DNSBLs?– Completeness– Responsiveness

What about DNS-Based Blacklists?

Page 19: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

• What is the completeness of the DNSBL?

• What is the responsiveness of the DNSBL?– How many distinct domains are targeted by a

spamming host before it is blacklisted?

• Does frequency of spam from a host change after it is blacklisted?

Questions about DNSBLs

Page 20: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

20

Blacklisting: Completeness

~80% listed on average

~95% of bots listed in one or more blacklists

Number of DNSBLs listing this spammer

Only about half of the IPs spamming from short-lived BGP are listed in any blacklistF

ract

ion

of

all

spam

rec

eive

d

Spam from IP-agile senders tend to be listed in fewer blacklists

Page 21: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

21

Are IP-Based Blacklists Enough?

• Mail Avenger is very aggressive– Eight different blacklists

• Cloaking techniques complicate detection– For example, what if a bot could change IP addresses

and remain reachable?• LAN agility• BGP agility

Page 22: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

• Response Time– Difficult to calculate without “ground truth”

– Can still estimate lower bound

Infection

S-Day

Possible DetectionOpportunity

RBL Listing

Time

Response Time

Lifecycle of a spamming host

A Model of Responsiveness

Page 23: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

• Data– 1.5 days worth of packet captures of DNSBL queries

from a mirror of Spamhaus– 46 days of pcaps from a hijacked C&C for a Bobax

botnet; overlaps with DNSBL queries

• Method– Monitor DNSBL for lookups for known Bobax hosts

• Look for first query

• Look for the first time a query response had a ‘listed’ status

Measuring Responsiveness

Page 24: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

• Observed 81,950 DNSBL queries for 4,295 (out of over 2 million) Bobax IPs

• Only 255 (6%) Bobax IPs were blacklisted through the end of the Bobax trace (46 days)– 88 IPs became listed during the 1.5 day DNSBL trace

– 34 of these were listed after a single detection opportunity

Both responsiveness and completeness appear to be low.Much room for improvement.

Responsiveness: Preliminary Results

Page 25: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

25

• Over 60% are queried by just one IP/AS– Hypothesis: Decreased chances of being reported

Domains Performing Lookups

Page 26: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

26

So…What can be Done?

• Network-level behavior of spammers– Ultimate goal: Construct spam filters based on network-level

properties, rather than content – Content-based properties are malleable

• Low cost to evasion: Spammers can easily alter content• High admin cost: Filters must be continually updated

– Content-based filters are applied at the destination• Too little, too late: Wasted network bandwidth, storage, etc.

• Study of DNS-based blacklists

• “Discovery”: One of the most telling network-level properties is botnet membership– DNSBL Counter-Intelligence– Network monitoring

Page 27: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

27

Mitigation #1: Counter-Intelligence

• Botmasters advertise spamming bots for which bots are not listed in any blacklist.

• Insight: Someone must be looking up the bots!

• Can we fish out these DNSBL “reconnaissance” queries and identify subjects/targets as suspect?

Page 28: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

28

Legit Queries vs. Reconnaissance

• Legitimate queriers are also the targets of queries

• Reconnaissance queriers are ususally not queried themselves

email to mx.a.com

DNS-Based

Blacklist

Legit Mail Server Amx.a.com

Legit Mail Server B

mx.b.com

email to mx.b.com

lookupmx.a.com

lookup mx.b.com

DNS-Based

Blacklist

Reconnaissance host

Page 29: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

29

Measurement Approach

• Log Spamhaus queries

• Construct querier/queried graph

• Prune graph: only nodes in the Bobax trace

• Examine nodes with high out-degree– Hypothesis: targets of nodes with high out-degree likely bots

Page 30: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

30

Who’s Doing the Lookups?

• The botmaster, on behalf of the bots• The bots, on behalf of themselves• The bots, on behalf of each other

Spam Sinkhole

Implication: Use a “seed” set to bootstrap?

Known bobax drone!

Page 31: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

31

Some Problems with Counter-Intel

• Constructing the query graph is intensive– Computationally– Storage-wise

• Initially pruning the graph with IP addresses of known suspects (e.g., spammers) could help

Page 32: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

32

Mitigation: Network Monitoring

• In-network filtering– Requires the ability to detect botnets

• Question: Can we detect botnets by observing communication structure among hosts?

Example: Migration between command and control hosts

New type of problem: essentially coupon collectionHow good are current traffic sampling techniques at exposing these patterns?

Page 33: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

33

Experimental Setup

Page 34: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

34

(Preliminary) Results

Feasible sampling rates

Conventional sampling techniques are not well-

suited to collecting conversations

Page 35: Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

35

Summary Lessons

• Network-level spam filtering holds promise– Potentially a useful complement to content-based filters– Today’s DNSBLs aren’t doing the tricks

• Two critical pieces– Monitoring techniques (which might be used together)

• In-network (e.g., with better traffic monitoring techniques)• At the edge (e.g., DNSBL reconnaissance)

– Routing security

• “Clean-slate” wish list– Better notions of identity– More agile monitoring/sampling techniques