spam and botnets: characterization and mitigation nick feamster anirudh ramachandran david dagon...

Spam and Botnets:Characterization and Mitigation

Nick Feamster

Anirudh RamachandranDavid DagonGeorgia Tech

Talk Overview

• Network-level behavior of spammers– Ultimate goal: Construct spam filters based on network-level

properties, rather than content – Content-based properties are malleable

• Low cost to evasion: Spammers can easily alter content• High admin cost: Filters must be continually updated

– Content-based filters are applied at the destination• Too little, too late: Wasted network bandwidth, storage, etc.

• Study of DNS-based blacklists

• “Discovery”: One of the most telling network-level properties is botnet membership– DNSBL Counter-Intelligence– Network monitoring

Network-Level Behavior of Spammers: Major Findings

• Where does spam come from?– Most received from few regions of IP address space– Insight about spammier prefixes could improve filters

• Do spammers hijack routes?– A small set of spammers continually advertise short-lived routes– Traceability is not guaranteed

• How is spam sent?– Most coming from Windows hosts (likely, bots)– Identification of spamming groups (e.g., botnets) could help

Data Collection

• Two domains instrumented with MailAvenger (both on same network)– Sinkhole domain #1

• Continuous spam collection since Aug 2004• No real email addresses---sink everything• 10 million+ pieces of spam

• Legitimate mail corpus from a large email provider (40 million inboxes)

• Monitoring BGP route advertisements from same network

Mail Collection: MailAvenger

• Highly configurable SMTP server that collects many useful statistics

BGP Data Collection

Spam Study: Major Findings

• How is spam sent?– Most coming from Windows hosts (likely, bots)– identification of spamming groups (e.g., botnets) could help

What IP ranges does spam come from?

/24 prefix

nSpam comes from a few concentrated regions

of IP address space

Distribution across ASes

• Top two spamming ASes: 10% of received spam• ASes in the US: most spam• Top ASes for legitimate email are different

Top 10 ASes by Spam Count Top 10 ASes by Legit Email Count

Points to note

• How is spam sent?– Most coming from Windows hosts (likely, bots)– Indentification of spamming groups (e.g., botnets) could help

BGP Spectrum Agility

• Log IP addresses of SMTP relays• Join with BGP route advertisements seen at network

where spam trap is co-located.

A small club of persistent players appears to be using

this technique.

Common short-lived prefixes and ASes

61.0.0.0/8 4678 66.0.0.0/8 2156282.0.0.0/8 8717

~ 10 minutes

Somewhere between 1-10% of all spam (some clearly intentional,

others might be flapping)

Why Such Big Prefixes?

• Flexibility: Client IPs can be scattered throughout dark space within a large /8– Same sender usually returns with different IP

addresses

• Visibility: Route typically won’t be filtered (nice and short)

• How is spam sent?– Most coming from Windows hosts (likely, bots)– Identification of spamming groups (e.g., botnets) could help

Characteristics of spamming bots

• Distribution across IP space for bots– Similar to IP space distribution for all spam– Lower bot activity in ranges where spam also comes

from hijacked routes

• Operating Systems of Spamming Hosts– ~ 95% run Windows– The 4% Unix-based hosts send up to 8% spam

Most Bots Send Low Volumes of Spam

Lifetime (seconds)

amMost bot IP addresses send very little spam, regardless

of how long they have been spamming…

99% of bots

Most Bot IP addresses are quiet

65% of bots only send mail to a domain once over 18 months

Blacklists may want to target IP ranges, rather than individual IPs

Lifetime (seconds)

Take-Away Lessons

• Network-level properties are less malleable, and are observable closer to the source of spam

• Aggregate properties (e.g., IP prefix, ASN, route used etc.) may be more effective

• Some network-level properties can be incorporated into spam filters– could be used as a first-pass filter

• Spam filtering requires a better notion of end-host identity

• Securing the Internet routing infrastructure is key to traceabilty

Network-Level Spam Filtering

Redefining End-Host Identifiers

• DNS-based Blacklists (DNSBLs)– The most prevalent network-level spam filtering

mechanism today– Various criteria: open relays/proxies, virus senders,

bad/unused address spaces etc.– Hundreds of DNSBLs of all sizes

• How to measure the effectiveness of DNSBLs?– Completeness– Responsiveness

What about DNS-Based Blacklists?

• What is the completeness of the DNSBL?

• What is the responsiveness of the DNSBL?– How many distinct domains are targeted by a

spamming host before it is blacklisted?

• Does frequency of spam from a host change after it is blacklisted?

Questions about DNSBLs

Blacklisting: Completeness

~80% listed on average

~95% of bots listed in one or more blacklists

Number of DNSBLs listing this spammer

Only about half of the IPs spamming from short-lived BGP are listed in any blacklistF

Spam from IP-agile senders tend to be listed in fewer blacklists

Are IP-Based Blacklists Enough?

• Mail Avenger is very aggressive– Eight different blacklists

• Cloaking techniques complicate detection– For example, what if a bot could change IP addresses

and remain reachable?• LAN agility• BGP agility

• Response Time– Difficult to calculate without “ground truth”

– Can still estimate lower bound

Infection

Possible DetectionOpportunity

RBL Listing

Response Time

Lifecycle of a spamming host

A Model of Responsiveness

• Data– 1.5 days worth of packet captures of DNSBL queries

from a mirror of Spamhaus– 46 days of pcaps from a hijacked C&C for a Bobax

botnet; overlaps with DNSBL queries

• Method– Monitor DNSBL for lookups for known Bobax hosts

• Look for first query

• Look for the first time a query response had a ‘listed’ status

Measuring Responsiveness

• Observed 81,950 DNSBL queries for 4,295 (out of over 2 million) Bobax IPs

• Only 255 (6%) Bobax IPs were blacklisted through the end of the Bobax trace (46 days)– 88 IPs became listed during the 1.5 day DNSBL trace

– 34 of these were listed after a single detection opportunity

Both responsiveness and completeness appear to be low.Much room for improvement.

Responsiveness: Preliminary Results

• Over 60% are queried by just one IP/AS– Hypothesis: Decreased chances of being reported

Domains Performing Lookups

So…What can be Done?

• Network-level behavior of spammers– Ultimate goal: Construct spam filters based on network-level

properties, rather than content – Content-based properties are malleable

• Low cost to evasion: Spammers can easily alter content• High admin cost: Filters must be continually updated

– Content-based filters are applied at the destination• Too little, too late: Wasted network bandwidth, storage, etc.

• Study of DNS-based blacklists

• “Discovery”: One of the most telling network-level properties is botnet membership– DNSBL Counter-Intelligence– Network monitoring

Mitigation #1: Counter-Intelligence

• Botmasters advertise spamming bots for which bots are not listed in any blacklist.

• Insight: Someone must be looking up the bots!

• Can we fish out these DNSBL “reconnaissance” queries and identify subjects/targets as suspect?

Legit Queries vs. Reconnaissance

• Legitimate queriers are also the targets of queries

• Reconnaissance queriers are ususally not queried themselves

email to mx.a.com

DNS-Based

Blacklist

Legit Mail Server Amx.a.com

Legit Mail Server B

mx.b.com

email to mx.b.com

lookupmx.a.com

lookup mx.b.com

DNS-Based

Blacklist

Reconnaissance host

Measurement Approach

• Log Spamhaus queries

• Construct querier/queried graph

• Prune graph: only nodes in the Bobax trace

• Examine nodes with high out-degree– Hypothesis: targets of nodes with high out-degree likely bots

Who’s Doing the Lookups?

• The botmaster, on behalf of the bots• The bots, on behalf of themselves• The bots, on behalf of each other

Spam Sinkhole

Implication: Use a “seed” set to bootstrap?

Known bobax drone!

Some Problems with Counter-Intel

• Constructing the query graph is intensive– Computationally– Storage-wise

• Initially pruning the graph with IP addresses of known suspects (e.g., spammers) could help

Mitigation: Network Monitoring

• In-network filtering– Requires the ability to detect botnets

• Question: Can we detect botnets by observing communication structure among hosts?

Example: Migration between command and control hosts

New type of problem: essentially coupon collectionHow good are current traffic sampling techniques at exposing these patterns?

Experimental Setup

(Preliminary) Results

Feasible sampling rates

Conventional sampling techniques are not well-

suited to collecting conversations

Summary Lessons

• Network-level spam filtering holds promise– Potentially a useful complement to content-based filters– Today’s DNSBLs aren’t doing the tricks

• Two critical pieces– Monitoring techniques (which might be used together)

• In-network (e.g., with better traffic monitoring techniques)• At the edge (e.g., DNSBL reconnaissance)

– Routing security

• “Clean-slate” wish list– Better notions of identity– More agile monitoring/sampling techniques

spam and botnets: characterization and mitigation nick feamster anirudh ramachandran david dagon...

spam slide

spam ases

spam study

little spam

spam counttop

spam trap

network slide

construct spam filters

Documents

1 openflow research on the georgia tech campus network russ...

dagon magazine 01

1632 anirudh kumar

network-level spam and scam defenses nick feamster georgia...

building a dynamic reputation system for dnsbuilding a...

book of dagon

understanding the network- level behavior of spammers...

dagon rising

silverline: data and network isolation for cloud...

the book of dagon

malware repository update david dagon georgia institute of...

spam sagar vemuri slides courtesy: anirudh ramachandran nick...

network security problems nick feamster...

dagon e outros contos

nick feamster mit feamster@csail.mit.edu robust internet...

network operations nick feamster feamster

anirudh sivaraman kaushalram bachelor of technology...

lovecraft - dagon (volum povestiri

network-level spam and scam defenses nick feamster georgia...

nick feamster princeton university