a multifaceted approach to understanding the botnet phenomenon (2006) jonathan brant cap 6135 –...
TRANSCRIPT
A MULTIFACETED APPROACH TO A MULTIFACETED APPROACH TO UNDERSTANDING THE BOTNET UNDERSTANDING THE BOTNET PHENOMENON (2006)PHENOMENON (2006)
Jonathan BrantCAP 6135 – Spring 2010
Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose, Andreas Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose, Andreas TerzisTerzis
Computer Science DepartmentComputer Science DepartmentJohns Hopkins UniversityJohns Hopkins University
OverviewOverview
Introduction Background Measurement Methodology
Malware Collection Graybox testing Longitudinal Tracking of Botnets
Results and Analysis Botnet Prevalence Spreading Methods Growth Patterns Botnet Structures Effective Botnet Size Lifetime “Insider’s view”
Conclusion
IntroductionIntroduction
Botnets – “networks of infected end-hosts that are under the control of a human operator” Bots – end-hosts Botmaster – human operator
Command and Control channels facilitate botmaster commands to bots in the botnet Channels can use different communication
mechanisms (e.g. P2P) Most modern botnets use Internet Relay Chat (IRC)
Originally used to form large chat rooms
IntroductionIntroduction
Botnets almost always used for illegal activities Extortion E-mail spamming Identity theft Software piracy
IntroductionIntroduction
Paper attempts to address inquiries such as: Number of botnet “species”
Behavioral categorization of different species Evolution of a botnet
BackgroundBackground
Step 1 – Botnets commandeer victims via remotely exploiting vulnerability of software running on victim Infection strategies
include: Self-replicating worms E-mail viruses Social engineering
Convincing victims to run malicious code on their machine
BackgroundBackground
Step 2 – Victim executes shellcode and image of bot binary is fetched from location within botnet When fetch is
complete, the binary installs itself on target machine and automatically starts on each reboot
BackgroundBackground
Step 3 – Bot attempts to contact IRC server (address stored in executable) Using a DNS name
instead of IP address allows botmaster to retain control if IP is blacklisted by ISP
BackgroundBackground
Step 4 – Bot attempts to establish IRC session and join C2 channel Three authentication steps:
Bot authenticates itself using PASS message
This is the IRC session password
Bot issues C2 channel password
This password and session password are in bot binary
Botmaster authenticates to bot population
This prevents other botmasters from seizing control of botnet
BackgroundBackground
Step 5 – Channel topic is parsed and executed Contains default
command that every bot executes
Future commands coming from botmaster can vary widely Wide variety of available
commands/responses increases difficulty of classifying botnet behaviors
Measurement MethodologyMeasurement Methodology
Data collection includes three phases: Malware collection Binary analysis via gray-box testing Tracking of IRC botnets through IRC and
DNS trackers
Measurement| Malware Measurement| Malware CollectionCollection
Distributed darknet Locally deployed
darknet Allocated but
unused portion of IP address space
14 distributed nodes using PlanetLab testbed
Goal is to collect as many bot binaries as possible Must support a wide array of data collection
endpoints and be highly scalable
Measurement| Malware Measurement| Malware CollectionCollection Modified nepenthes
platform Mimics replies generated
by vulnerable services Collects first-stage exploit
(shell-code) Raw packets from
PlanetLab nodes translated Using translation module
written in Click Packets were injected into
local tunneling interface
Measurement | Malware Measurement | Malware CollectionCollection On-line download
modules in nepenthes disabled to prevent excessive downloads Binaries retrieved by
generating list of URL targets and sending to download station
Download station filtered entries in list and extracted unique sources/URLs
Measurement | Malware Measurement | Malware CollectionCollection Honeynet catches
exploits missed by nepenthes Composed of honeypots
running unpatched, virtual instances of Windows XP Each honeypot assigned
private static IP on separate VLAN
Infected honeypots sustain IRC connections until VM’s reimaged
Suspect binaries retrieved by comparing VM contents to clean Windows image
Measurement | Malware Measurement | Malware CollectionCollection Gateway routes
darknet traffic to various parts on internal network Half of darknet
prefixes directed to local responder and other half to honeynet NAT used to map
each honeypot to 128 darknet IP addresses
Measurement | Malware Measurement | Malware CollectionCollection
Serves as firewall preventing honeypots from conducting outbound attacks or infecting each other Cross-infection
prevented by: Placing each
honeypot on separate VLAN and terminating cross-VLAN traffic
Terminating cross-VLAN traffic
Outbound traffic block on popular vulnerable ports
135, 139, 445, etc.
Measurement | Malware Measurement | Malware CollectionCollection
Runs IRC detection module Application-level
traffic searched for common IRC protocol strings
NICK, JOIN, USER Once IRC connection
witnessed, detection module establishes record for IRC session
When honeypot attempts to reconnect, connection allowed to proceed to IRC server
Measurement | Malware Measurement | Malware CollectionCollection
Detection module only allows one honeypot to connect to an IRC server at given point in time Gateway detects
when honeypot is infected
Rules inserted to block inbound attacks to that honeypot
Measurement | Malware Measurement | Malware CollectionCollection
Gateway also performs miscellaneous tasks Triggering honeypot
re-imaging Loading clean
Windows images Pre-filtering for
download station Running local DNS
server to resolve DNS queries from honeypots
Measurement | Graybox Measurement | Graybox TestingTesting Graybox testing used to extract features
of suspicious binaries Analysis spans two distinct phases
(performed on isolated network segment) First phase derives network fingerprint of
binary Second phase extracts binaries IRC-specific
features
Measurement | Graybox Measurement | Graybox TestingTesting Phase 1: Creation of a network fingerprint
Server acts as network sink All network activity initiated by malware will be detected
Traffic logs automatically processed to extract network fingerprint
DNS – target of DNS requests IPs – destination IP addresses Ports – contacted ports and protocols Scan – whether or not default scanning behavior was
detected Default scanning behavior – any attempt to contact more
than 20 distinct destinations on the same port during the monitored period
scanPortsIPsDNSfnet ,,,
Measurement | Graybox Measurement | Graybox TestingTesting Phase 2: Extraction of IRC-related features
Modified version of UnrealIRC daemon instantiated on network sink
IRC listens on all ports ever observed in network fingerprint
Upon detecting an IRC connection, IRC-fingerprint is created
PASS – initial password to establish IRC session NICK – nickname USER – username MODE – modes set JOIN – IRC channels to be automatically joined (and their
associated passwords)
JOINMODEUSERNICKPASSfirc ,,,,
Measurement | Graybox Measurement | Graybox TestingTesting (Phase 2 continued…)
To learn botnet “dialect”, bot connects to local IRC server and enters default channel IRC query engine plays role of botmaster Bot behavior is learned by subjecting it to
series of commands Command set includes:
IRC commands observed in honeynet traces Commands extracted from publicly available
bot source code
Measurement | Longitudinal Measurement | Longitudinal TrackingTracking Botnet tracking is performed by two
means: The use of a custom, lightweight IRC
tracker Probing DNS caches across the globe
Measurement | Longitudinal Measurement | Longitudinal TrackingTracking IRC Tracker
“A modified IRC client that can join a specified IRC channel and automatically answer directed queries based on the template created by the graybox testing technique”
IRC tracker instantiates new IRC session to IRC server using fingerprint and template IRC trackers need to appear responsive
Measurement | Longitudinal Measurement | Longitudinal TrackingTracking
In order to appear “real”, the following must be performed: Traffic filtered so inappropriate information is
not included in template Filtering performed automatically while bot is
executing Computer specifications (e.g. memory, disk
space) are changed to resemble specifications of a real machine
IRC query engine issues a set of commands that require stateful responses
Emulates a bot’s stateful software
Measurement | Longitudinal Measurement | Longitudinal TrackingTracking DNS Tracking
Most bots issue DNS queries to resolve IP addresses of IRC servers
Caches of DNS servers are probed to determine number of DNS servers giving cache hits “Cache hit” implies at least one client queried
DNS server during lifetime of its DNS entry
Measurement | Longitudinal Measurement | Longitudinal TrackingTracking
Original list contained 1.6 million DNS servers First filter removed top level domains
.gov, .mil, etc. Second filter checked consistency of replies
Two consecutive DNS queries First query was recursive and forced DNS server to
completely resolve query Second query was not recursive and obtained local
answers from server cache TTL field in second response should be smaller than first
After filtering, master list consisted of 800,000 name servers
For a given IRC server, the caches of all DNS servers were probed and any associated cache hits recorded
Results and AnalysisResults and Analysis
Results include: Traffic traces captured on local darknet
3 month period IRC logs gathered
3 month period DNS cache hit results from tracking 65 IRC
servers 45 day period
Results| Botnet PrevalenceResults| Botnet Prevalence
Botnet Traffic share Two week snapshot of total incoming SYN packets to local
darknet vs. packets originating from botnet spreaders A botnet spreader is any source that delivered a bot executable
27% of incoming SYNs attributed to botnet spreaders 76% come from
botnet spreaders if target ports considered
Results| Botnet PrevalenceResults| Botnet Prevalence
More than 90% of all traffic during peaks targeted ports used by botnet spreaders
More than 70% of sources during peak periods sent shell exploits
This suggests the total amount of botnet-related traffic is far greater than 27%
Results| Botnet PrevalenceResults| Botnet Prevalence
11% (85,000) of probed servers were involved in at least one botnet activity 55% of servers in
dataset are for .com domains 82% of DNS cache
hits from name servers in that domain
29% of .com servers had at least 1 cache hit
.cn servers only 0.2% of total servers 95% of them
exhibited botnet activity
Results|Spreading MethodsResults|Spreading Methods
Botnets use a variety of means to spread and recruit new victims Email Web Active scanning (most prevalent)
Botnets can be grouped into two types: Worm-like
Continuosly scan ports following target selection algorithm
Variable scanning behavior Uses a number of scanning algorithms
Uniform, non-uniform, localized
Results|Spreading MethodsResults|Spreading Methods
192 botnets captured 34 botnets were Type-I
Upon infection, bot starts scanning IP space for new victims
Initiates connection to IRC servers (identified by hard-coded list of DNS names)
All IRC servers/channels bots tried to join were unreachable
Channel was banned by public IRC server DNS name did not resolve to valid IP address
Still, botnet grew over time due to persistence of scanning
Results|Spreading MethodsResults|Spreading Methods
Type-II botnets were the most prevalent class Scanning triggered by a command More difficult to track due to continuosly changing behavior Localized and targeted scanning are were most prevalent
techniques Localized scanning focused on Class B address space Targeted scanning focused on Class A address space
Results|Growth PatternsResults|Growth Patterns
In order to examine botnet growth patterns, two approaches were taken: Cumulative number of unique DNS cache
hits for distinct botnets over time was plotted
Growth pattern was compared to behavior learned from IRC tracker
Results|Growth PatternsResults|Growth Patterns
Botnets with semi-exponential growth patterns exhibit persistent random scanning activity (unchanging over time) Example: for one botnet, topic of the corresponding channel
was set to randomly scan port 445 indefinitely for one month Related to worm infections
Results|Growth PatternsResults|Growth Patterns
Also representative of botnets with intermittent activity profiles Example: Botnet III corresponds to botnet that infected
honeypots on 3/13/2006 IRC server went down between 4/12/2006 – 4/30/2006 When IRC server became available, growth slope increased and
honeypots were re-infected by the same botnet
Results|Growth PatternsResults|Growth Patterns
Predominantly used time-scoped scanning commands As opposed to continuous scanning like the
previous two
Results|Growth PatternsResults|Growth Patterns
Botnet evolution estimated by counting unique sources for message broadcast to the channel Only plotted botnets of comparable size on
a given plot Trends confirm heterogeneity in botnets
Results | Botnet StructuresResults | Botnet Structures
60% of 318 collected malicious binaries were IRC bots Four predominant IRC structures were revealed
All bots connected to a single IRC server Prevalent among smaller classes of botnets (few hundred users) 70% of observed botnets fell into this category
IRC servers can be connected to form an IRC network supporting large numbers of users 30% of botnets bridged on multiple servers 50% bridged between two servers only
Seemingly unrelated botnets appear more similar when comparing their naming conventions, channel names, and operators’ user IDs These botnets may seem to belong to the wrong botmaster
Selected group of bots commanded to download an updated binary Results in bots being moved to a different IRC server
Results | Effective Botnet Results | Effective Botnet SizeSize Botnet footprint can become fairly large
(> 15,000 bots) Predominant structures were botnets
managed by a single or few servers Distinction drawn between
Botnet’s footprint Number of bots connected to IRC channel
at a given time Effective Size
Results | Effective Botnet Results | Effective Botnet SizeSize Some “chatty” IRC servers broadcast join/leave information for
members on channel Number of online bots versus time for these IRC servers is plotted in figure
9
Maximum size of online population is significantly smaller than botnet’s footprint Footprint greater than
10,000 No more than 3,000
bots online at the same time
Effective size has little impact on long term activity, however, it affects number of bots available to execute commands in a timely manner
Results | LifetimeResults | Lifetime
Discrepancy between footprint and effective size likely due to the long lifetime of a typical botnet Bot death rates and high churn rates can
affect botnet’s effective size
Results | LifetimeResults | Lifetime
High churn rates Bots do not stay long on IRC channel
Average stay time: 25 minutes 90% stay less than 50 minutes
Likely causes include Client instability
(as a result of infection)
Machine hibernation
Botmasters commanding bots to leave the channel
Results | Botnet Software Results | Botnet Software TaxonomyTaxonomy 183 of 192 confirmed IRC-based bot executables
responded to probes of IRC query engine 49% of bots run AV/FW killer – a utility that disables anti-
virus and firewall processes 43% run identd server which performs user identification
Ensures only intended bots join a given IRC channel 40% run system security monitor which tightens bot
security E.g. disables DCOM service and file sharing
38% run a registry monitor which alerts the bot of any attempts to disable it
Results | Botnet Software Results | Botnet Software TaxonomyTaxonomy Number of exploits within bot binaries
varied from 3 to 29 Average of 15 exploits per binary Most popular exploits (appeared in over
75% of binaries) DCOM135 LSASS445 NTPASS
Results | Botnet Software Results | Botnet Software TaxonomyTaxonomy Authors evaluated effectiveness of ClamAV and
Norton anti-virus on 192 malicious binaries ClamAV classified 137 binaries as malicious Norton anti-virus classified 179 binaries as malicious
Windows XP service pack 2 still not immune
Results | “Insider’s view”Results | “Insider’s view”
Traces show that: Botmasters share information concerning
what prefixes should not be scanned Bots are tweaked to minimize chatter on C2
channel Bots are probed to detect and isolate
“misbehavers” Also look for “super-bots” with high bandwidth
network links and large storage capacities
Results | “Insider’s view”Results | “Insider’s view”
Bots migrate from one IRC channel to another, instructed by: Command from botmaster Download of replacement software that points to a
different C2 server
Results | “Insider’s view”Results | “Insider’s view”
Control commands include channel joins and leaves
Mining category includes commands that collect machine specifications
Attack category includes commands from botmasters to attack other network computers
Results | “Insider’s view”Results | “Insider’s view”
Small botnets receive larger portion of control and mining commands Hands-on botmasters that devote large amounts of
time to manually control their botnet Medium and large
botnets have a larger percentage of cloning and download commands Cloning could
include the use of one botnet to attack another botnet by overloading its IRC server with join requests
ConclusionConclusion
Botnets are a major contributor to overall unwanted internet traffic Most botnet traffic can be attributed to scans used to
recruit new bots IRC is still the dominant protocol used for C2
communications Effective sizes of botnets can range from a few
hundred to a few thousand Botnet footprints are usually much larger than effective size
This is due to high churn rate within a botnet Bot’s average channel occupancy is less than half an hour
Graybox testing revealed sophistication of modern bot software E.g. Self-protection measures
ContributionsContributions
Established empirical measurements for botnet prevalence Particularly in considering DNS cache hits by IRC
botnets that were tracked Classified typicality's of bot binaries
Registry monitoring tactics Locking down host vulnerabilities
Classified most prevalent botnet activities as a function of botnet size
Delineated between botnet footprint and “effective size.”
Large experiment samples further solidified results
CritiqueCritique
Focused mainly on Windows-based systems It would be interesting to see the effectiveness of
noted infection strategies on Unix systems Only evaluated two anti-virus applications
Perhaps include other popular anti-virus applications McAfee, Symantec Corporate, AVG, etc.
Authors noted 60% of binaries collected were IRC bots Did the other 40% use a different communication
mechanism? If so, it would be interesting to know how they were
structured and if the authors evaluated them in any way
ReferencesReferences
[1] Rajab, M.A., Zarfoss, J., Monrose, F., & Terzis A. (2006). A multifaceted approach to understanding the botnet phenomenon. Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, Rio de Janeriro, Brazil