botminer: clustering analysis of network traffic for protocol- and structure- independent botnet...
Post on 20-Dec-2015
219 views
TRANSCRIPT
BotMiner: Clustering Analysis of Network
Traffic for Protocol- and Structure-Independent Botnet Detection
Written by Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee
Georgia Institute of Technology
Presented by Latasha A. Gibbs
University of South Carolina
OUTLINE
• Definitions and Introduction to Botnet Problem• Detection Framework and Implementation• Traffic Monitors and Clustering• Experiments & Evaluations• Related Work• Future Work & Conclusion
What is a Bot?• Software application that can run automated tasks over the Internet
• Perform task that are simple and structurally repetitive
• Implemented when emulation of human activity is required
•Implemented where response speed is faster than that of humans is required
• Examples include gaming bots, chat bots, or auction-site robots
What is the Command and Control (C&C) Channel?
• The Command and Control (C&C) channel is needed so bots can receive their commands and coordinate fraudulent activities
• The C&C channel is the means by which individual bots form a botnet
Definition of Botnet
-collection of compromised computers connected to the Internet
Paper – coordinated group of malware instances that are controlled via C&C communication channel
Botnet Diagram
http://www.netconclave.com/blog/wp-content/uploads/botnet.jpg
The Problem
• Botnets are becoming one of the most serious threats to Internet security
(1 quarter of all pc’s are part of a botnet) –Vint Cerf
• Botnets are evolving and becoming more flexible• Prior to this research, most detection approaches
worked only on specific command and control (C&C) protocols like (IRC and HTTP) and structures that are (centralized)
Centralized Structure VS.Peer-to-Peer (P2P) Structure
Top 10 Most Wanted Botnetshttp://www.networkworld.com/news/2009/072209-botnets.html
*Compromised US Computers
1. Zeus (3.6 million)2. Koobface (2.9 million)3. TidServ (1.5 million)4. Trojan.Fakeavalert (1.4 million)5. TR/DIdr.Agent.JKH (1.2 million)6. Monkif (520,000)7. Hamweq (480,000)8. Swizzor (370,000)9. Gammima (230,000)10. Conficker (210,000)
Botnets are utilized to perform the following:
• Distributed Denial-of-Service Attacks• Spam• Phishing• Identity Theft• Information Exfiltration
OUTLINE
• Definitions and Introduction to Botnet Problem• BotMiner Detection Framework and Implementation• Traffic Monitors and Clustering• Evaluations• Related Work• Future Work• Conclusion
MAIN COMPONENTS OF BOTMINER DETECTION SYSTEM
1.C-PLANE MONITOR2.A-PLANE MONITOR3.C-PLANE CLUSTERING4.A-PLANE CLUSTERING5.CROSS-PLANE CORRELATOR
Architecture of BotMiner
OUTLINE
• Definitions and Introduction to Botnet Problem• Detection Framework and Implementation• Traffic Monitors and Clustering• Evaluations• Related Work• Future Work• Conclusion
Traffic Monitors
C-PLANE MONITOR• Captures network flows and
records information on “who is talking to whom”
• The fcapture tool was used (very efficient on high-speed networks)
• Each flow record contained: time, duration, source IP, destination IP, destination port, and # packets/bytes transferred in both directions
A-PLANE MONITOR• Logs information on “who is
doing what”• Based on Snort (open-source
intrusion detection tool)• Capable of detecting scanning
activities, spamming, and binary downloading
C-PLANE CLUSTERINGSection 2.5
• Responsible for reading logs generated by the C-plane monitor and finding clusters of machines that share similar communication patterns
Start Irrelevant traffic flows are filtered out (2 steps: basic filtering and white-listing)
• After basic filtering and white-listing, traffic is reduced further by aggregating related flows into communication flows (C-flows)
ARCHITECTURE OF C-PLANE CLUSTERING
Figure 3
C-PLANE CLUSTERING CONT’DGiven an epoch E (1 day),For all m TCP/UDP flows must share the same:• protocol (TCP or UDP)• source IP• destination IP• portAggregated into the same C-flow denoted asWhere is a single TCP/UDP flow.Basically, the set of all the n C-flows tells “who was talking to whom” during that epoch.
Vector Representation of C-flows• To apply clustering algorithms to C-flows they must be
translated into suitable vector representation• A number of statistical features are extracted from each
C-flow and then they are translated into a d-dimensional pattern of vectors.
Given a C-flow, the discrete sample distribution is computed for 4 variables:
1. The number of flows per hour (fph)2. The number of packets per flow (ppf)3. The average # of bytes per packet (bpp)4. The average # of bytes per second (bps)
Example of Results
2-Step Clustering
• Clustering C-flows is very expensive• Because the % of machines in a network that
are infected by bots is generally small, the authors separate the botnet-related C-flows from a large number of benign C-flows
• To cope with the complexity of clustering the task is broken down into steps
2-Step Clustering of C-flows
A-PLANE CLUSTERING•In this stage, 2 layer clustering is performed on activity logs
•A scan activity could include scanning ports (e.g, two machines scanning the same ports)
•Another feature could be target subnet/distribution (e.g. when machines are scanning the same subnet)
•For spam activity, two machines could be clustered together if their SMTP connection destinations are highly overlapped
•In the paper, the authors cluster scanning activities according to the destination scanning ports
CROSS-PLANE CORRELATION Section 2.7
•The idea is to cross-check both clusters (A-PLANE & C-PLANE) to find out whether there is evidence of the host being a part of a botnet• The first step is to compute the bot score s(h) for each host h on which at least one kind of suspicious activity has been performed•Host that have a score below a certain threshold are filtered out•The remaining most suspicious host are grouped together according to a similarity metric that takes into account A-PLANE and C-PLANE clusters•Higher values are assigned to “strong” activities like spam or exploits•Lower values are assigned to “weak activities” like scanning or binary downloads
Hierarchical Clustering & Dendrogram
• The figure shows a hypothetical example
• The Davis-Bouldin (DB) validation index is used to find the best dendrogram cut
• The figure shows that the best cut suggested by the DB index is at height 90
OUTLINE
• Definitions and Introduction to the Botnet Problem• Detection Framework and Implementation• Traffic Monitors• Evaluations• Related Work• Future Work• Conclusion
EVALUATIONS
• Tested performance on several real-world network traces (campus network)
• C-PLANE and A-PLANE monitors were ran continuously for 10 days
• Collected 6 different botnets (IRC and HTTP)• Two P2P botnets, namely Nugache (82 bots)
and Storm(13 bots); the network trace lasted a whole day
10 DAYS
Collected Trace Results
Detection Results
OUTLINE
• Definitions and Introduction to Botnet Problem• Detection Framework and Implementation• Traffic Monitors• Evaluations• Limitations and Evasion• Related Work• Future Work• Conclusion
Limitations
• Adversaries that find details about the BotMiner detection framework and implementation will find ways to evade detection
• Possibility that attackers can evade C-PLANE and A-PLANE monitoring and clustering, or cross-plane correlation analysis
Evading C-PLANE Monitoring and Clustering Cont’d
Evasion Method• Switch between multiple
C&C servers• Randomizing individual
communication patterns (e.g. injecting random packets in a flow or by padding random bytes in a packet)
• Bots could use covert channels to hide their actual C&C communications
Examples
• Manipulate communication patterns
Evading A-PLANE Monitoring and Clustering
Evasion Method• Performing very
stealthy malicious activities
• Vary the way bots are commanded in the same monitored network
Example• Scan very slow (e.g.
send one scan per hour)• The “botmaster” sends
out different commands to each bot
Evading Cross-Plane Analysis
• The “botmaster” can send commands that are extremely delayed tasks
• Malicious activities are performed on different daysTrade-off: The “botmaster” also suffers because as the C&C communications slow down, efficiency of controlling the bot army declines
SOLUTIONS
• Use multiple-days of data• Cross check back several days
• More false positives may be generated• If the pc is powered off or disconnected from
the Internet the bot is unavailable to the “botmaster”
TRADE-OFF
Related Work
• Paper by Gu, Zhang, and LeeBotSniffer-proposed approach to use network-based anomaly detection to identify botnet C&C channels in local area networks without any prior knowledge of signatures or C&C server addressesContribution: Understanding and detecting the C&C channel has great value in the battle against botnets
Note: If a active C&C server is taken down or interrupted , the “botmaster” will not be able to control the botnet
BotSniffer Architecture
BotSniffer Cont’d• If certain conditions are
satisfied, BotSniffer has the ability to detect the botnet C&C channel even if there is only 1 bot in the monitored network
• BotSniffer was tested on several network traces in two modes: stand alone and normal traces
• BotSniffer has two main components: the monitor engine and the correlation engine
• C&C detection module relies on known signatures
• Possible evasion methods include evasion using white-list, evasion by long delays, evasion by injecting random noise packets, and evasion by encryption
Related Work
• Researcher use honeypot techniques to collect and analyze bots (e.g. Nephenthes)
• TAMD is a system used to detect malware (including botnets) by aggregating traffic that shares the same destination, similar payloads, and host with similar host OS platforms
• Rishi is a signature-based IRC botnet detection system that matches known IRC bot nickname patterns (http://rishi.sourceforge.net/)
Related Work Cont’d
Considering most of the systems mentioned in the paper, the majority of the systems are
limited to specific botnet protocols and structures, and many work only on IRC-based
botnets
OUTLINE
• Definitions and Introduction to the Botnet Problem• Detection Framework and Implementation• Traffic Monitors and Clustering• Experiments & Evaluations• Related Work• Future Work & Conclusion
Future Work• Develop new techniques to monitor/cluster
communication and activity patterns of botnets making them more robust to evasion attempts
• Improve efficiency of C-flow converting and clustering algorithms
• Combine different correlation techniques• Develop a new real-time detection system based
on layered design using sampling techniques that work in large high-speed networks
Predictions• Researching home networks
and mobile devices since they are primary targets
• Research socialbots since internet criminals are gathering and selling vast quantities of data
• Monitoring virtual environments since “botmasters“ are now able to detect whether defenders are using virtual machines
Conclusion
• Botnet detection is a challenging problem• BotMiner Detection System is independent of
protocol and structure used by most botnets• BotMiner shows excellent detection accuracy
on various types of botnets including IRC, HTTP, and P2P with very low false positive rate on normal traffic
Free Tools
• RUBotted (2.0 Beta) by Trend Microwww.free.antivirus.com/rubotted/
• BotHunter www.bothunter.net(Windows, Linux, FreeBSD, and MacOs)
• Microsoft Security Essentialshttp://windows.microsoft.com/enUS/windows/
products/security-essentials
REFERENCES
• [1] G. Gu, J. Zhang, and W. Lee. BotSniffer: Detecting botnet command and control channels in network traffic. In Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS ’08), 2008.
• [2] Botnet. http://en.wikipedia.org/wiki/Botnet• [3] BotHunter. http://www.bothunter.net• [4] Messmer, Ellen. America’s 10 Most Wanted Botnets. July 22, 2009.
http://www.networkworld.com/news/2009/072209-botnets.html.• [5] RUBotted. http://www.free.antivirus.com/rubotted/• [6] Whitelist. http://en.wikipedia.org/wiki/Whitelist• [7] P. Baecher, T. Holz, M. Kotter, and G. Wicherski. Know your enemy: Tracking
botnets. http://www.honeynet.org/papers/bots/ 2005.• [8] B. Stone-Gross, M. Cova, L. Cavallaro, B. Gilbert, M. Szydlowski, R. Kemmerer, C.
Kruegel, G. Vigna. Your Botnet is My Botnet: Analysis of a Botnet TakeOver. In Proceedings of the ACM CCS, 2009.
• [9] P. Baecher, M. Koeter, T. Holz, M. Dornseif, and F. Freiling. The nepenthes platform: An efficient approach to collect malware. In Proceedings of International Symposium on Recent Advances in Intrusion Detection (RAID ’06), Hamburg, September 2006.
• [10] Anderson, Nate. http://arstechnica.com/old/content/2007/01/8707.ars
THANK YOU!Questions or Comments…