how different are malware collected actively and passively?speed.cis.nctu.edu.tw/~ydlin/pdf/how...

13
1 How Different Are Malware Collected Actively and Passively? Ying-Dar Lin 1 , Chia-Yin Lee 2 , Yu-Sung Wu 1 , Pei-Hsiu Ho 1 , Fu-Yu Wang 3 , Yi-Lang Tsai 4 1 Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan [email protected]; [email protected]; [email protected] 2 Information & Communication Technology Laboratories, National Chiao Tung University Hsinchu, Taiwan [email protected] 3 Network Benchmarking Laboratory, National Chiao Tung University, Hsinchu, Taiwan [email protected] 4 National Center for High-Performance Computing, Tainan, Taiwan [email protected] Abstract--A new open-source tool chain with malware collection, detection, and analysis is presented, evaluated, and open sourced. It actively collects malware through two channels: web-links and peer-to-peer. Then it detects malware with multiple anti-virus scanners and analyzes their host and network activities on virtual machines. The evaluation shows the differences between the malware collected by the traditional passive honeypot approach and this active approach, in the aspects of distribution, timeliness, and degree of network and host activity, i.e., activeness. These two collections are quite distinct and disjoint. Among the 800 and 354 malware programs collected in one month actively and passively, respectively, 79% of the passively captured malware are active bots and 59% of the actively captured malware are passive Trojans. 16% of actively captured are zero-day malware, but no zero-day malware had been captured by the passive approach. Moreover, the passive approach receives mostly, 98%, malware with network behavior while the active approach collects both, i.e., 77% with network behavior and 23% with only host behavior or no action. Keywords: honeypot, malware, behavior analysis, vulnerability I. Introduction Malware is a collective term for a variety of nefarious-purpose software that accesses a system without the administrator’s permission. Apart from network intrusion techniques [1, 2], malware accounts for most of the security threats on the Internet. Examples of malware include viruses, worms, Trojans, backdoors, rootkits, and more recently, bots [3]. Malware can be spread through various channels. In other words, malware can be passively downloaded, from malware's point of view, to a computer when a user clicks on a malicious uniform resource locator (URL) to download a file or retrieves a file through a peer-to-peer (P2P) network, or it can actively exploit a vulnerability of a victim host and transfer itself to the host. Before developing countermeasures against malware, it is important to understand how malware behave and propagate, and what evasion techniques they might implement. Aside from forensic examinations of infected hosts, automated malware collection can be done passively or actively, from collectors' point of view. Passive malware collection [4, 5] lures malware to enter or attack a specific system in order to gather malware samples, while active malware collection links to or opens malicious URLs or files to collect malware samples. However, one question remains less answered: how different are the malware collected actively and passively? In other words, Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE This article has been accepted for publication in Computer but has not yet been fully edited. Some content may change prior to final publication.

Upload: others

Post on 19-Aug-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

1

How Different Are Malware Collected Actively and Passively?

Ying-Dar Lin1, Chia-Yin Lee2, Yu-Sung Wu1, Pei-Hsiu Ho1, Fu-Yu Wang3, Yi-Lang Tsai4

1Department of Computer Science,National Chiao Tung University, Hsinchu, Taiwan

[email protected]; [email protected]; [email protected] & Communication Technology Laboratories,

National Chiao Tung University Hsinchu, [email protected]

3Network Benchmarking Laboratory,National Chiao Tung University, Hsinchu, Taiwan

[email protected] Center for High-Performance Computing, Tainan, Taiwan

[email protected]

Abstract--A new open-source tool chain with malware collection, detection, and analysisis presented, evaluated, and open sourced. It actively collects malware through twochannels: web-links and peer-to-peer. Then it detects malware with multiple anti-virus scanners and analyzes their host and network activities on virtual machines. The evaluation shows the differences between the malware collected by the traditional passive honeypot approach and this active approach, in the aspects of distribution,timeliness, and degree of network and host activity, i.e., activeness. These two collections are quite distinct and disjoint. Among the 800 and 354 malware programs collected in one month actively and passively, respectively, 79% of the passively captured malware are active bots and 59% of the actively captured malware are passive Trojans. 16% of actively captured are zero-day malware, but no zero-day malware had been captured by the passive approach. Moreover, the passive approach receives mostly, 98%, malware with network behavior while the active approach collects both, i.e., 77% with network behavior and 23% with only host behavior or no action.

Keywords: honeypot, malware, behavior analysis, vulnerability

I. Introduction Malware is a collective term for a variety of nefarious-purpose software that accesses asystem without the administrator’s permission. Apart from network intrusion techniques[1, 2], malware accounts for most of the security threats on the Internet. Examples of malware include viruses, worms, Trojans, backdoors, rootkits, and more recently, bots[3]. Malware can be spread through various channels. In other words, malware can be passively downloaded, from malware's point of view, to a computer when a user clickson a malicious uniform resource locator (URL) to download a file or retrieves a filethrough a peer-to-peer (P2P) network, or it can actively exploit a vulnerability of a victim host and transfer itself to the host. Before developing countermeasures against malware, it is important to understand how malware behave and propagate, and what evasion techniques they might implement. Aside from forensic examinations of infected hosts,automated malware collection can be done passively or actively, from collectors' point of view. Passive malware collection [4, 5] lures malware to enter or attack a specific systemin order to gather malware samples, while active malware collection links to or opensmalicious URLs or files to collect malware samples. However, one question remains less answered: how different are the malware collected actively and passively? In other words,

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.

Page 2: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

2

we would like to check whether the malware samples captured by different approachesare similar or not? If the sets of the collections are disjoint, both active and passiveapproaches should be adopted to collect universal malware samples.

An Active Tool ChainIn addition to other active malware collection frameworks [6-8] which collect

malware from websites, we found that P2P file sharing has become a more popular channel for malware propagation. Thus, we developed an open-source tool chain [9],Honey-Inspector, which actively collects malware from P2P file sharing in addition to websites. The collection is scanned by anti-virus scanners to detect malware. The tool chain then analyzes the host and network behaviors of detected malware.

With this tool chain, along with a passive collection of malware, we intend to answerthe following specific questions: (i) What are the types, or distribution, of malware collected by the active approach and the passive approach, respectively? Are these two collections disjoint or overlapped? (ii) Can both approaches capture the new or zero-daymalware not yet defined by the virus signature? That is the timeliness of captured malware. (iii) What is the degree of network and host activity, i.e., activeness, of the captured malware by these two approaches? If we know the answers to the above questions, we could collect more and newer malware by adopting the appropriate approach, and detect new malware earlier.

The rest of this article is organized as follows. Malware collection techniques are surveyed in Section II. The proposed framework and modules are presented in Section III.In Section IV, we describe our experiment environment and the results of analyzing the captured malware. Finally, Section V concludes the work and discusses possible future works.

II. Automated Malware Collection Techniques

Server Honeypots vs. Client HoneypotsHoneypots [10] have been an important tool for automated collection of malware samples.Traditionally, a honeypot would exhibit a variety of popular services with vulnerabilities [4, 5, 11] to passively lure attack and malware. This type of honeypots is often referred to as a service-side honeypot, or simply, a server honeypot. Server honeypots generally cannot capture malware that are designed to exploit client-side vulnerabilities and not the vulnerabilities in the services running a server. To detect this kind of malware, in 2002,the concept of client honeypot was proposed [6]. A client honeypot system is typically made of client software with vulnerabilities and will actively interact with services on the Internet to attract malware that are designed to attack the client-side. Many client honeypot solutions have been proposed [6-8, 12]. HoneyClient [6], HoneyMonkey [7],and Capture-HPC [8] are classified as high-interaction client honeypots. A high-interaction client honeypot system comes with a full client software stack. The vulnerabilities exhibited to malware are more realistic, but it is also more complex to setup and has a high runtime overhead. Therefore, there are also client honeypots that emulate client software vulnerabilities, instead of running an actual client software stack. Such client honeypots are referred to as low-interaction client honeypot [12]. Even though the low-interaction client honeypot system is easier to deploy and maintain, it may not be able to capture malware that targets those vulnerabilities not emulated by the honeypot.

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.

Page 3: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

3

Based on the concept of high-interaction client honeypots, we developed an open-source tool chain and executed malware in a virtual machine. The virtual machines canimitate behaviors of applications and respond in such a way that each malware perceivesit is in a real system. In this work, we compare the distribution, timeliness, and activenessof malware discovered by the client honeypot and the server honeypot.

Strategies of Malware Detection and AnalysisOverall, there are two different types of malware detection strategies [13]:

Behavior-based Detection: This technique can detect zero-day attacks which are unknown to malware detectors. The difficulty of this technique is determining what features to observe while running a program, which is usually slow.

Signature-based Detection: This technique tries to define the characteristics of malware as signatures, and then stores these signatures in a database. Then, it can detect efficiently, i.e., fast, if a program is malicious according to the malware signature database.

Table 1 shows a comparison between existing high-interaction client honeypotsystems [6-8] and our Honey-Inspector. The client honeypot systems mainly use URLs or emails as the source of malware collection, and exercise behavior-based detection. In addition to URLs and emails, the P2P file sharing technique offers the important advantage of malware collection, as the quantity of files exchanged on P2P file sharing is huge. Overall, Honey-Inspector can be regarded as an enhanced version of previous client honeypot systems. It adds P2P in the collection and signature-based scanners before behavior-based detection. This is because the number of suspicious programs is huge and needs to be reduced by the fast signature-based scanners before passing to theslow behavior-based detection. Note that Honey-Inspector forsakes the source from emails because emails could be considered as a passive approach to collect malware.

Table 1 Comparison between existing client honeypots and Honey-InspectorItems

Schemes

Sources of Malware Collection

Observations of Malware’s Host Behavior

Observations of Malware’s Network Behavior

HoneyMonkey malicious URLs file systemregistrysystem configuration

network traffic

HoneyClient malicious URLsemails

file systemregistrysystem process

network traffic

Capture-HPC malicious URLs file systemregistrysystem process

network traffic

Honey-Inspector malicious URLsP2P file sharing

file system registrysystem configuration

network traffic

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.

Page 4: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

4

Fig. 1 The architecture and process of Honey-Inspector

III. The Honey-InspectorThe tool chain, Honey-Inspector, attempts to collect new malware and analyze the behavior of malware. It collects malware through malicious URLs and P2P file sharing software. It also analyzes the host and network behaviors of the captured malware. The entire tool chain consists of Proactive Malware Capture and Detection (PMC&D), Host Behavior Analysis (HBA), and Network Behavior Analysis (NBA). The architecture and process of Honey-Inspector are shown in Fig. 1 and the number in Fig. 1 is the order of our system workflow.A. The System Workflow

First, PMC&D proactively collects suspicious samples (e.g. executable files) from P2P file sharing and web crawling. Then, PMC&D divides the downloaded samples into two categories: benign and malicious employing multiple anti-virus scanners. Once a suspicious sample is identified as malicious by any one of the scanners, PMC&D will store it in the database. The HBA and NBA modules are used to analyze the host and the network behavior of each collected malware, respectively. HBA and NBA rely on virtual machine environment to observe malware’s behaviors. After the malware is executed ona virtual machine, it might modify the file system or the registry of system on the virtual machine. It might also launch network attacks. HBA takes a snapshot of the file system and the registry of the virtual machine system before the malware is executed, and will then compare the snapshot with the file system and the registry after running the malware. On the other hand, NBA sniffs and records the network traffic in the virtual machine

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.

Page 5: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

5

environment. If the HBA and NBA modules do not reveal any behavior, Honey-Inspector will treat the suspicious program as benign.

PMC&D itself does not invoke the execution of a collected sample. The connection rate of the URL crawling process is rate-controlled. As a result, the active collection mechanism itself will not cause damage to the Internet. The behavior analysis modules (NBA and HBA) do require putting a collected sample into execution. Due to the analysis environment is a closed network environment, our Honey-Inspector will not be able to damage the Internet either.

B. Honey-Inspector ModulesThe block diagram of Honey-Inspector is shown in Fig. 2. Here we present more

details of the three key modules in Honey-Inspector.

Proactive Malware Capture and Detection (PMC&D)In Fig. 2 (a), PMC&D collects suspicious samples from P2P file sharing and web

crawling. The collected samples are then checked by multiple anti-virus scanners. At present, we use four anti-virus scanners, including Avast, Avira AntiVir, Kaspersky, and Nod32. Note that these four anti-virus scanners have relatively low false negative rates(i.e., 2.4%) [14]. If a suspicious sample is suspected to be malware by one of the anti-virus scanners, PMC&D stores this sample with detection results in the database.

Host Behavior Analysis (HBA) Often malicious program modify the contents of file systems or registries. In Fig. 2 (b),

HBA sets up a new virtual machine based on Microsoft Windows XP platform and makes a copy of the clean registry and the file system. Then, HBA executes thesuspicious sample. Once this sample has modified the registry and/or the file system of the virtual machine, HBA uses the DiffReg module to detect the infected registry and the DiffFS module to identify the file system in the infected virtual machine. The abovemodules can find what the program modified. By checking the modifications, we could identify malware. Finally, HBA stores the comparison results of the malware executionin the database.

Fig. 2 The block diagram of the tool chain mechanism

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.

Page 6: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

6

Network Behavior Analysis (NBA)Some malicious programs would generate network traffic. In Fig. 2 (c), NBA sets up a

new virtual machine and executes a suspicious program for a period of time. The Netsniff module monitors the network traffic of the virtual machine. If a suspicious program generates network traffic, such as sending an email or ICMP packet, the Netsniff module will sniff and store the packet traces in the database.

IV. Experimental ResultsWe present the results from our experiments on malware collection and analysis. Theexperiments consist of a month-long period of malware collection and analysis. Within the one-month period, we observed 800 unique malware samples actively collected by Honey-Inspector on one personal computer with several virtual machines.Simultaneously, we also obtained 354 unique malware samples from the passivehoneypot system of the National Center for High-Performance Computing (NCHC) inTaiwan. The NCHC passive honeypot system is operated under the internationalhoneynet project [11]. It consists of 3600 server honeypots deployed across 9 academicnetwork centers in Taiwan. The volume of network traffic through the system is approximately 100GB per hour. All server honeypots are established with vulnerabilitiesby referring National Vulnerability Database [16] to lure malware.

Distribution: More Trojans but Less Bots by Honey-InspectorThe distributions of captured malware for Honey-Inspector and the passive honeypot

system are shown in Fig. 3 (a) and Fig. 3 (b), respectively. All collected samples are filtered by four anti-virus scanners. For the passive system, 79% of the captured malware are bots and 21% are worms, Trojans, and other malware. Since bots spread by scanningcomputer vulnerabilities through the network, they can be easily captured by the passive honeypot. On the contrary, 59% of the malware captured by Honey-Inspector are Trojans,and 41% are bots, worms, and other malware. Trojans usually are contained inside apparently harmless programs so that they can get control and do their chosen form of damage. As a result, Honey-Inspector would capture Trojans with a higher likelihood as it actively seeks for potential malware binary files from multiple sources. The passive honeypot system would wait to capture the Trojan binary only until a Trojan initiates aremote attack. Hence, the percentage of captured Trojans is small in the passive honeypot.The result from the passive honeypot system is consistent with the result from related studies such as the work [18] by Chamotra et al.

In addition, we analyze the quantity of captured malware samples from URLs and P2P file-sharing software. We discover that the quantity of captured malware from P2P file-sharing software is several orders of magnitude higher than that from URLs. This is primarily due to the methodology employed by the peers to search for content. Hence, P2P file-sharing is the dominant source for malware collection.

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.

Page 7: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

7

Fig. 3 The distribution of captured malware for Honey-Inspector and the passive honeypot system

Timeliness: Honey-Inspector Collects Newer MalwareWe evaluate the timeliness of the two collection systems by looking at the capture

time of malware. The capture time is defined as the number of days from the time when the malware’s signature is defined in the anti-virus scanner’s signature database to the time when the malware is first found by the collection system. A positive capture time value implies a less timely collection system, where the collected malware has been known (to anti-virus scanners) and is not timely. A negative capture time means the collection system finds the malware before the malware is known to the anti-virus scanner. For the experiment, we refer to Kaspersky’s malware signature database for the signature definition time. We use Kaspersky as the reference because it is one of the most popular anti-virus scanners on the market and, importantly, it is the only one we are aware of which publicizes the per-malware signature definition time.

Malware signatures uniquely identify specific malware. The anti-virus scanners havetheir signature databases updated periodically. In this way, we establish whether the malware is detected by an anti-virus scanner, and, if not, how long it takes until a new signature of this scanner is added to detect this zero-day malware. Therefore, we have a timeline analysis that indicates when each zero-day malware is first recognized by the each of these anti-virus scanners.

Fig. 4 (a) shows the capture time of malware using the active Honey-Inspector and thepassive honeypot system. We observe that most of the malware (i.e., 84% of the collected malware) were collected by Honey-Inspector after their signatures had been defined in Kaspersky’s malware signature database. About 16% of the collected malware were collected by Honey-Inspector before their signatures were defined. However, for the passive honeypot system, all the collected malware had been defined in Kaspersky.Based on the above results, we can clearly see that that Honey-Inspector is timelier in collecting new types of malware than the passive honeypot system.

Fig. 4 (b) shows the capture time for collecting bots using Honey-Inspector and the passive honeypot system. For Honey-Inspector, similarly, about 84% of the collected bots had already been defined in Kaspersky, and about 16% are new. In the case of the passive honeypot system, all the collected bots had been defined in Kaspersky more than 100 days ago. For the passive honeypot system, most bots were collected between 2001 days and 2500 days after the malware signatures were defined in Kaspersky. Therefore,

Others21%

Worm8%

Trojan59%

Bot12%

Honey-Inspector

Others13%

Worm3%

Trojan5%

Bot79%

The passive honeypot system

(a) The distribution of captured malware for Honey-Inspector

(b) The distribution of captured malware for passive honeypot system

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.

Page 8: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

8

compared to the passive honeypot system, Honey-Inspector can capture much newer bots.In addition, we observe that the bots collected by the two systems are mostly disjoint. Specifically, the number of the bots that were collected by both systems is less than 1%of the total number of bots collected. One possible reason for this phenomenon is that the one-month period of malware collection may not be long enough.

Fig. 4 The distribution of capture time of malware and bots for Honey-Inspector and the passive honeypot system

Honey-Inspector Captures Malware with Network Behavior, Host Behavior, or Both

Table 2 shows the behavior analysis results of collected malware. We classify the collected malware into four different classes based on the behaviors exhibited as follows.

Class A: Malware that exhibits both host behavior and network behavior (strong activeness).

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.

Page 9: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

9

Table 2 Classification of behaviors of collected malwareClasses The passive honeypot system Honey-Inspector

A 67% 48%B 31% 29%C 1% 12%D 1% 11%

Class B: Malware that only exhibits network behavior (strong activeness).

Class C: Malware that only exhibits host behavior (weak activeness).

Class D: Malware that exhibits no behavior (weak activeness).

We consider that malware with network behavior as exhibiting strong activeness, because this kind of malware can attack remote hosts through the network. The passive honeypot system can collect malware with network behavior, with 98% of the collectedmalware in class A and class B. Furthermore, it also captures malware with only host behavior or no behavior at all, i.e., weak activeness, with only 2% of the collectedmalware in class C and class D. The reason might be that the passive honeypot system lures malware to actively infect hosts through network behaviors. In comparison, 77% of the malware collected by Honey-Inspector belongs to class A and class B, and 23% of the malware collected by Honey-Inspector belongs to class C and class D.

V. Conclusions and Future WorkIn this work, we compared the passive server honeypot with the active client honeypot. Anew open-source malware tool chain, Honey-Inspector, is used as the active client honeypot to perform active collection, detection, and analysis of malware. Honey-Inspector can capture malware samples that exhibit both strong and weak activeness. On the contrary, since the conventional passive system is a server honeypot, it collects only malware samples that exhibit strong activeness. We also analyzed the distribution and timeliness of captured malware. Bots are the highest proportion of captured malware in the passive honeypot system, whereas most of the captured malware by Honey-Inspectorare Trojans. Due to the active attack by bots, they can be captured by the passivehoneypot system easily. Trojans usually hide as files on websites or P2P file sharing systems, which awaits access from victim clients. They can be most effectively collected by an active client honeypot such as Honey-Inspector. We also notice from theexperimental results that Honey-Inspector can capture much newer malware than the passive honeypot system.

We shall extend the period of malware collection time and find out if the findings can be generalized to a longer time-scale. In addition, we shall add more sources of malware collection, such as crawling the shared links on social networks, and improve the behavior analysis process further to enhance the system’s overall collection capability.

Acknowledgements

This work was supported in part by National Communications Commission (NCC),National Science Council (NSC), and Chunghwa Telecom Co., Ltd. of Taiwan.

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.

Page 10: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

10

References

[1] N. Provos, D. McNamee, P. Mavrommatis, K. Wang, and N. Modadugu, “The ghost in the browser: analysis of web-based malware,” in Proceedings of the 2007 Workshop on Hot Topics in Understanding Botnets (HotBots), 2007.[Online]. http://static.usenix.org/event/hotbots07/tech/full_papers/provos/provos.pdf

[2] M. van Gundy and H. Chen, “Noncespaces: using randomization to enforce information flow tracking and thwart cross-site scripting attacks,” in Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS), 2009.

[3] C. Kanich, K. Levchenko, B. Enright, G. M. Voelker, and S. Savage, “The heisenbot uncertainty problem: challenges in separating bots from chaff,” in Proceedings of First Usenix Workshop on Large-scale Exploits and Emergent Threats, 2008, article no. 10.

[4] P. Baecher, M. Koetter, T. Holz, M. Dornseif, and F. Freiling, “The nepenthes platform: an efficient approach to collect malware,” in Proceedings of ninth Recent Advances in Intrusion Detection, 2006, pp. 165-184.

[5] L. Spitzner, Honeypots: Tracking Hackers. Boston, MA: Addison-Wesley, 2002.[6] K. Wang, Honeyclient Development Project.

[Online]. http://www.honeyclient.org/trac/[7] Y. M. Wang, “Strider HoneyMonkeys: active client-side honeypots for finding web

sites that exploit browser vulnerabilities,” in Part of Works in Progress at the 14th USENIX Security Symposium, 2007.[Online]. www.usenix.org/event/sec05/wips/wang.pdf

[8] Capture-HPC.[Online]. https://projects.honeynet.org/capture-hpc/

[9] A Malware Tool Chain: Honey-Inspector.[Online]. http://honeyinspector.sourceforge.net/

[10]A. Mairh, D. Barik, K. Verma, and D. Jena, “Honeypot in network security: a survey,” in Proceedings of the 2011 International Conference on Communication, Computing & Security, 2011, pp. 600-605.

[11]The Honeynet Project. [Online]. http://www.honeynet.org/node/157[12]Y. Alosefer and O. Rana, “Honeyware: a web-based low interaction client honeypot,”

in Proceedings of third International Conference on Software Testing, Verification, and Validation Workshops (ICSTW), 2010, pp. 410-417.

[13]N. Idika and A. P. Mathur. (Feb. 2007). A Survey of Malware Detection Techniques. Department of Computer Science, Purdue University, West Lafayette, IN.[Online]. http://www.serc.net/system/files/SERC-TR-286.pdf

[14]AV-Comparatives (Mar. 2012), On-demand Detection of Malicious Software, AV-Comparatives Lab, Innsbruck, Austria.[Online]. http://www.av-comparatives.org/images/docs/avc_fdt_201203_en.pdf

[15]M. Ramilli, M. Bishop, and S. Sun, “Multiprocess malware,” in Proceedings of sixth International Conference on Malicious and Unwanted Software, 2011, pp. 8-13.

[16]National Vulnerability Database.[Online]. http://nvd.nist.gov/

[17]C. Rossow, C. J. Dietrich, C. Kreibich, C. Grier, V. Paxson, N. Pohlmann, H. Bos, and M. van Steen, “Prudent practices for designing malware experiments: status quo and outlook,” in Proceedings of the IEEE Symposium on Security and Privacy, 2012,pp.65-79.

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.

Page 11: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

11

[18]S. Chamotra, R. K. Sehgal, R. Kamal, and J. S. Bhatia, “Data diversity of a distributed honey net based malware collection system,” in Proceedings of International Conference on Emerging Trends in Networks and Computer Communications (ETNCC), 2011, pp.125-129.

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.

Page 12: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

12

Author Biographies:Yin-Dar Lin is professor of the Department of Computer Science, founder and director of the Network Benchmarking Lab (http://www.nbl.org.tw/) and the Embedded Benchmarking Lab (http://www.ebl.org.tw/), National Chiao Tung University, Hsinchu, Taiwan. His research interests include design, analysis, implementation, and benchmarking of network protocols and algorithms; quality of service; network security; deep-packet inspection; P2P networking; and embedded hardware/software codesign. Prof. Lin has a Ph.D. degree in computer science from the University of California, Los Angeles. He is an IEEE Fellow and on the editorial boards of IEEE Transactions on Computers, Computer, IEEE Wireless Communications, IEEE Network, IEEE Communications Magazine--Network Testing Series, IEEE Communications Surveys and Tutorials, IEEE Communications Letters, Computer Communications, Computer Networks, and IEICE Transactions on Information and Systems.

Chia-Yin Lee received his Ph.D. degree in computer science from National Chung Cheng University, Chiayi, Taiwan in 2010. From February 2011 to July 2011, he was an adjunct assistant professor of the Department of Computer Science and Information Management, Providence University, Taichung, Taiwan. Since August 2011, he has been a postdoctoral researcher of the Information & Communication Technology Laboratories,National Chiao Tung University, Hsinchu, Taiwan. His research interests include cryptography, network security, and image processing. Dr. Lee is a member of the IEEE.

Yu-Sung Wu received the B.S. degree in Electrical Engineering from National Tsing Hua University, Hsinchu, Taiwan in 2002, and the Ph.D. degree in Electrical and Computer Engineering from Purdue University, West Lafayette, Indiana in 2009. He is an assistant professor in the Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan, where he leads the Laboratory of Security and Systems. His research interests include security, dependability, and systems.

Pei-Hsiu Ho received the M.S. degree in information management from Southern Taiwan University of Technology, Tainan, Taiwan in 2003. She received the Ph.D. in Computer Science from National Sun Yat-sen University, Kaohsiung, Taiwan in 2011.Since 2011, she has been a postdoctoral researcher of the Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan. Her current research interests include cryptographic protocols and mobile security.

Fu-Yu Wang received his B.S. and M.S. degrees in Computer Science and Information Engineering from the Chung-Hua University, Hsinchu, Taiwan in 2005 and 2007,respectively. Since 2008, he has been a project manager of the Network Benchmarking Laboratory (http://www.nbl.org.tw/), National Chiao Tung University, Hsinchu, Taiwan. He is currently working on network and mobile security.

Yi-Lang Tsai is researcher of the National Center for High-Performance Computing,founder and director of Cloud Security Alliance Taiwan Chapter, leader of The Honeynet Project Taiwan Chapter, leader of the Information Security Incident Response Team to handle security incidents for Taiwan Academic Network (TANet) and Taiwan Advanced Research & Education Network (TWAREN). He is also known as a famous IT commentator and author in Taiwan. He has published 33 books and many columns on the professional IT publications. His research interests include the honeypot related technologies and the cloud security technologies for industry, government, and academy.

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.

Page 13: How Different Are Malware Collected Actively and Passively?speed.cis.nctu.edu.tw/~ydlin/pdf/How Different Are... · 2018. 6. 7. · of malware discovered by the client honeypot and

13

Contact Information:Yin-Dar LinDepartment of Computer Science, National Chiao Tung University, No. 1001, Ta-Hsueh Rd., Hsinchu City 300, TaiwanTel: +886-3-5712121 ext. 31899, 59228Fax: +886-3-5721490, +886-3-5131341Email: [email protected]: http://www.cs.nctu.edu.tw/~ydlin

Chia-Yin LeeR601, MISRC Building, National Chiao Tung University, No. 1001, Ta-Hsueh Rd., Hsinchu City 300, TaiwanTel: +886-3-5736727 ext. 233Fax: +886-3-5131341Email: [email protected]

Yu-Sung WuDepartment of Computer Science, National Chiao Tung University, No. 1001, Ta-Hsueh Rd., Hsinchu City 300, TaiwanTel: +886-3-5712121 ext. 56623Fax: +886-3-5721490Email: [email protected]: http://www.cs.nctu.edu.tw/~ysw

Pei-Hsiu HoR701, MISRC Building, National Chiao Tung University, No. 1001, Ta-Hsueh Rd., Hsinchu City 300, TaiwanTel: +886-3-5712121 ext. 56667Fax: +886-3-5131341Email: [email protected]

Fu-Yu WangR601, MISRC Building, National Chiao Tung University, No. 1001, Ta-Hsueh Rd., Hsinchu City 300, TaiwanTel: +886-3-5736727 ext. 211Fax: +886-3-5131341Email: [email protected]

Yi-Lang TsaiNational Center for High-Performance Computing, No.28, Nanke 3rd Rd., Xinshi Dist., Tainan City 741, TaiwanTel: +886-6-5050940 ext. 749Fax: +886-6-5055909Email: [email protected]

Digital Object Indentifier 10.1109/MC.2013.226 0018-9162/$26.00 2013 IEEE

This article has been accepted for publication in Computer but has not yet been fully edited.Some content may change prior to final publication.