measuring adversaries

Measuring Adversaries

Vern Paxson International Computer Science Institute / Lawrence Berkeley National Laboratory

[email protected]

June 15, 2004

Data courtesy of Rick Adams

= 80% growth/year

= 60% growth/year

= 596% growth/year

The Point of the Talk

• Measuring adversaries is fun:– Increasingly of pressing interest– Involves misbehavior and sneakiness– Includes true Internet-scale phenomena– Under-characterized– The rules change

The Point of the Talk, con’t

• Measuring adversaries is challenging:– Spans very wide range of layers,

semantics, scope– New notions of “active” and “passive”

measurement– Extra-thorny dataset problems– Very rapid evolution: arms race

Adversaries & Evasion

• Consider passive measurement: scanning traffic for a particular string (“USER root”)

• Easiest: scan for the text in each packet– No good: text might be split across multiple

packets• Okay, remember text from previous packet

– No good: out-of-order delivery• Okay, fully reassemble byte stream

– Costs state ….– …. and still evadable

Evading Detection ViaAmbiguous TCP Retransmission

The Problem of Evasion

• Fundamental problem passively measuring traffic on a link: Network traffic is inherently ambiguous

• Generally not a significant issue for traffic characterization …

• … But is in the presence of an adversary: Attackers can craft traffic to confuse/fool monitor

The Problem of “Crud”

• There are many such ambiguities attackers can leverage

• A type of measurement vantage-point problem

• Unfortunately, these occur in benign traffic, too:– Legitimate tiny fragments, overlapping fragments– Receivers that acknowledge data they did not

receive– Senders that retransmit different data than originally

• In a diverse traffic stream, you will see these:– What is the intent?

Countering Evasion-by-Ambiguity

• Involve end-host: have it tell you what it saw• Probe end-host in advance to resolve

vantage-point ambiguities (“active mapping”)– E.g., how many hops to it?– E.g., how does it resolve ambiguous

retransmisions?• Change the rules - Perturb

– Introduce a network element that “normalizes” the traffic passing through it to eliminate ambiguities

• E.g., regenerate low TTLs (dicey!)• E.g., reassemble streams & remove inconsistent

retransmissions

Adversaries & Identity

• Usual notions of identifying services by port numbers and users by IP addresses become untrustworthy

• E.g., backdoors installed by attackers on non-standard ports to facilitate return / control

• E.g., P2P traffic tunneled over HTTP

• General measurement problem: inferring structure

Adversaries & Identity:Measuring Packet Origins

• Muscular approach (Burch/Cheswick)– Recursively pound upstream routers to see which

ones perturb flooding stream• Breadcrumb approach:

– ICMP ISAWTHIS• Relies on high volume

– Packet marking• Lower volume + intensive post-processing• Yaar’s PI scheme yields general tomography utility

Yields general technique: power of introducing small amount of state inside the network

Adversaries & Identity:Measuring User Origins

• Internet attacks invariably do not come from the attacker's own personal machine, but from a stepping-stone: a previously-compromised intermediary.

• Furthermore, via a chain of stepping stones.• Manually tracing attacker back across the

chain is virtually impossible.• So: want to detect that a connection going

into a site is closely related to one going out of the site.

• Active techniques? Passive techniques?

Measuring User Origins, con’t

• Approach #1 (SH94; passive): Look for similar text– For each connection, generate a 24-byte

thumbprint summarizing per-minute character frequencies

• Approach #2 (USAF94) - particularly vigorous active measurement:– Break-in to upstream attack site– Rummage through its logs– Recurse


• Approach #3 (ZP00; passive): Leverage unique on/off pattern of user login sessions:– Look for connections that end idle periods at the

same time.– Two idle periods correlated if ending time differ by

≤ sec.– If enough periods coincide stepping stone pair.– For A B C stepping stone, just 2 correlations

suffices– (For A B … C D, 4 suffices.)


• Works very well, even for encrypted traffic• But: easy to evade, if attacker cognizant of

algorithm– C’est la arms race

• And: also turns out there are frequent legit stepping stones

• Untried active approach: imprint traffic with low-frequency timing signature unique to each site (“breadcrumb”). Deconvolve recorded traffic to extract.

Global-scale Adversaries: Worms

• Worm = Self-replicating/self-propagating code• Spreads across a network by exploiting flaws

in open services, or fooling humans (viruses)• Not new --- Morris Worm, Nov. 1988

– 6-10% of all Internet hosts infected

• Many more small ones since …… but came into its own July, 2001

Code Red

• Initial version released July 13, 2001.• Exploited known bug in Microsoft IIS Web

servers.• 1st through 20th of each month: spread.

20th through end of each month: attack.• Spread: via random scanning of 32-bit

IP address space.• But: failure to seed random number generator

linear growth reverse engineering enables forensics

Code Red, con’t

• Revision released July 19, 2001.• Payload: flooding attack on

www.whitehouse.gov.• Bug lead to it dying for date ≥ 20th of the

month.• But: this time random number generator

correctly seeded. Bingo!

Worm dies on July 20th, GMT

Measuring Internet-Scale Activity: Network Telescopes

• Idea: monitor a cross-section of Internet address space to measure network traffic involving wide range of addresses – “Backscatter” from DOS floods– Attackers probing blindly– Random scanning from worms

• LBNL’s cross-section: 1/32,768 of Internet– Small enough for appreciable telescope lag

• UCSD, UWisc’s cross-section: 1/256.

Spread of Code Red

• Network telescopes give lower bound on # infected hosts: 360K.

• Course of infection fits classic logistic.

• That night ( 20th), worm dies … … except for hosts with inaccurate clocks!

• It just takes one of these to restart the worm on August 1st …

Could parasitically analyze sample of 100K’s of clocks!

The Worms Keep Coming

• Code Red 2:– August 4th, 2001– Localized scanning: prefers nearby addresses– Payload: root backdoor– Programmed to die Oct 1, 2001.

• Nimda:– September 18, 2001– Multi-mode spreading, including via Code Red 2

backdoors!

Code Red 2 kills off Code Red 1

Code Red 2 settles into weekly pattern

Nimda enters the ecosystem

Code Red 2 dies off as programmed

CR 1 returns thanksto bad clocks

Code Red 2 dies off as programmed

Nimda hums along, slowly cleaned up

With its predator gone, Code Red 1 comes back!, still exhibiting monthly pattern

80% of Code Red 2 cleaned up due to onset of Blaster

Code Red 2 re-released with Oct. 2003 die-off

Code Red 1 and Nimda endemic

Code Red 2 re-re-released Jan 2004

Code Red 2 dies off again

Detecting Internet-Scale Activity

• Telescopes can measure activity, but what does it mean??

• Need to respond to traffic to ferret out intent

• Honeyfarm: a set of “honeypots” fed by a network telescope

• Active measurement w/ an uncooperating (but stupid) remote endpoint

Internet-Scale Adversary Measurement via Honeyfarms

• Spectrum of response ranging from simple/cheap auto-SYN acking to faking higher levels to truly executing higher levels

• Problem #1: Bait– Easy for random-scanning worms, “auto-rooters”– But for “topological” or “contagion” worms, need to

seed honeyfarm into application network Huge challenge

• Problem #2: Background radiation– Contemporary Internet traffic rife with endemic

malice. How to ignore it??

Measuring InternetBackground Radiation -- 2004

• For good-sized telescope, must filter:– E.g., UWisc /8 telescope sees 30Kpps of traffic

heading to non-existing addresses• Would like to filter by intent, but initially don’t

know enough• Schemes - per source:

– Take first N connections– Take first N connections to K different ports– Take first N different payloads– Take all traffic source sends to first N destinations

Responding to Background Radiation

Hourly Background Radiation Seen at a 2,560-address Telescope

Measuring Internet-scale Adversaries: Summary

• New tools & forms of measurement:– Telescopes, honeypots, filtering

• New needs to automate measurement:– Worm defense must be faster-than-human

• The lay of the land has changed:– Endemic worms, malicious scanning– Majority of Internet connection (attempts)

are hostile (80+% at LBNL)• Increasing requirement for application-

level analysis

The Huge Dataset Headache• Adversary measurement particularly requires

packet contents– Much analysis is application-layer

• Huge privacy/legal/policy/commercial hurdles• Major challenge: anonymization/agents

technologies– E.g. [PP03] “semantic trace transformation”– Use intrusion detection system’s application

analyzers to anonymize trace at semantic level (e.g., filenames vs. users vs. commands)

– Note: general measurement increasingly benefits from such application analyzers, too

Attacks on Passive Monitoring

• State-flooding:– E.g. if tracking connections, each new SYN

requires state; each undelivered TCP segment requires state

• Analysis flooding:– E.g. stick, snot, trichinosis

• But surely just peering at the adversary we’re ourselves safe from direct attack?

Attacks on Passive Monitoring• Exploits for bugs in passive analyzers!• Suppose protocol analyzer has an error

parsing unusual type of packet– E.g., tcpdump and malformed options

• Adversary crafts such a packet, overruns buffer, causes analyzer to execute arbitrary code

• E.g. Witty, BlackIce & packets sprayed to random UDP ports– 12,000 infectees in < 60 minutes!

Summary

• The lay of the land has changed– Ecosystem of endemic hostility– “Traffic characterization” of adversaries as

ripe as characterizing regular Internet traffic was 10 years ago

– People care

• Very challenging:– Arms race– Heavy on application analysis– Major dataset difficulties

Summary, con’t

• Revisit “passive” measurement:– evasion– telescopes/Internet scope– no longer isolated observer, but vulnerable

• Revisit “active” measurement– perturbing traffic to unmask hiding &

evasion– engaging attacker to discover intent

• IMHO, this is "where the action is” …• … And the fun!

measuring adversaries

Documents

network traffic

scanning traffic

p2p traffic

traffic characterization

benign traffic

diverse traffic stream

ambiguities attackers

growthyearthe point