techniques and solutions for - semantic scholar · techniques and solutions for addressing...
TRANSCRIPT
2
Techniques and Solutions forAddressing Ransomware Attacks
A dissertation presented in partial ful�llment of therequirements for the degree of
Doctor of Philosophy
in the �eld of
Information Assuranceby
Amin Kharraz
College of Computer and Information Science
Northeastern University
Ph.D. Committee
Engin Kirda Advisor, Northeastern University
William Robertson Advisor, Northeastern University
Long Lu Internal Member, Northeastern University
Manuel Egele External Member, Boston University
December 2017
Abstract
Ransomware is a form of extortion-based attack that locks the victim's digital
resources and requests money to release them. Although the concept of ransomware
is not new (i.e., such attacks date back at least as far as the 1980s), this type of
malware has recently experienced a resurgence in popularity. In fact, over the last
few years, a number of high-pro�le ransomware attacks were reported. Very recently,
WannaCry ransomware infected thousands of vulnerable machines around the world,
and substantially disrupted critical services such as British healthcare system. Given
the size and variety of threats we are facing today, having solutions to e�ectively
detect and analyze unknown ransomware attacks seems necessary.
In this thesis, we argue that it is possible to develop novel defense mechanisms,
and protect user data from a large number of ransomware attacks with zero data
loss. We show that such an approach is both feasible and e�ective. To support this
claim, we investigate how a successful ransomware attack interacts with the operating
system resources. In the �rst part of this thesis, we perform an evolutionary-based
analysis to understand the destructive behavior of these attacks. We show that by
monitoring �lesystem activity, it is possible to design practical defense systems that
could stop a large number of ransomware attacks, even those using sophisticated
encryption capabilities. In the second part, we propose a novel dynamic analysis
system, called Unveil, that is designed to analyze ransomware attacks, and model
their interactions with the �lesystem. In the third and the last part, we propose an
end-point framework, called Redemption, to protect user data from ransomware
attacks. We demonstrate that our proposed solution can be retro�tted into existing
operating systems, and achieve zero data loss against current ransomware families
without introducing alarm fatigue.
5
Acknowledgment
Ph.D. career is a long way. It requires patience and perseverance. During the
Ph.D. career, you may de�ne uninteresting projects, fail several time, or even ask
wrong questions, but it is �ne since the graduate school is the best place to fail and
learn from your failures. Eventually, at some point, you learn what problems are
interesting, how de�ne unde�ned problems, and why community should care about
these problems. In fact, these are parts of the valuable skillset that one earns during
his/her Ph.D. career.
Getting a Ph.D. is not possible without having great people around. I would like to
particularly thank Prof. Engin Kirda, my great advisor, who was always supporting
me from the very beginning. I also would like to thank my co-advisor � Prof. Wil
Robertson. Wil helped me a lot in developing great ideas. It was an honor to be part
of Boston SecLab and work with him.
I thank my committee members Prof. Long Lu and Prof. Manuel Egele for reading
this thesis, and providing valuable feedback.
I thank all the great teachers I had in Bushehr � my hometown. Mr. Rashidpour,
my math teacher, was a great mentor, and was one of the people that encouraged me
to become an engineer.
I also thank the most recent cadre of students, coworkers, and friends who made
the grad school so much fun including; Andrea, Michael, Kaan, Sajjad, Saman, Reza,
Ahmet, Talha, Tobi, Sevtap, and Amirali.
I thank my parents and my wonderful in-laws for their love and support. Lastly,
I thank my wonderful wife, Mahboubeh, for bearing with me through di�cult times
and making the life much easier. Mahboubeh supported me on every decision I made
6
in this journey, helped me to resume life after each paper rejection and resurrect
dreams after failures. Thank you my sweetheart!
7
8
Contents
1 Introduction 17
1.1 Motivation for Novel Defenses . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Limitations of Current Defense Mechanisms . . . . . . . . . . . . . . 19
1.3 Overview of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 20
2 An Analysis on Current Ransomware Attacks 23
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Ransomware Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Characterization and Evolution . . . . . . . . . . . . . . . . . . . . . 30
2.3.1 File System Activity . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.2 Mitigation Strategies . . . . . . . . . . . . . . . . . . . . . . . 41
2.4 Financial Incentives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.1 Bitcoin as a Charging Method . . . . . . . . . . . . . . . . . . 45
2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 A Dynamic Analysis Approach to Detecting Ransomware 55
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
9
3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3 Unveil Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3.1 Detecting File Lockers . . . . . . . . . . . . . . . . . . . . . . 61
3.3.2 Detecting Screen Lockers . . . . . . . . . . . . . . . . . . . . . 66
3.4 Unveil Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.4.1 Generating User Environments . . . . . . . . . . . . . . . . . 69
3.4.2 Filesystem Activity Monitor . . . . . . . . . . . . . . . . . . . 72
3.4.3 Desktop Lock Monitor . . . . . . . . . . . . . . . . . . . . . . 73
3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5.2 Ground Truth (Labeled) Dataset . . . . . . . . . . . . . . . . 75
3.5.3 Detecting Zero-Day Ransomware . . . . . . . . . . . . . . . . 80
3.5.4 Case Study: Automated Detection of a New Ransomware Family 86
3.6 Discussion and Limitations . . . . . . . . . . . . . . . . . . . . . . . . 88
3.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4 Protecting End-Points from Ransomware Attacks 93
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.4 Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.5 Detection Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.5.1 Content-based Features . . . . . . . . . . . . . . . . . . . . . . 103
4.5.2 Behavior-based Features . . . . . . . . . . . . . . . . . . . . . 104
10
4.5.3 Evaluating the Feature Set . . . . . . . . . . . . . . . . . . . . 105
4.5.4 Malice Score Calculation (MSC) Function . . . . . . . . . . . 106
4.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.7.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.7.2 Detection Results . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.7.3 Disk I/O and File System Benchmarks . . . . . . . . . . . . . 115
4.7.4 Real-world Application Testing . . . . . . . . . . . . . . . . . 119
4.7.5 Usability Experiments . . . . . . . . . . . . . . . . . . . . . . 121
4.8 Discussion and Limitations . . . . . . . . . . . . . . . . . . . . . . . . 123
4.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5 Conclusions 127
11
12
List of Figures
2-1 The malicious process attempts to get the �le map on the physical disk
to overwrite the �le's data after the encryption. . . . . . . . . . . . . 35
2-2 Disk layout for �les with di�erent sizes in NTFS �le system. . . . . . 37
2-3 A ransomware attack (Gpcode) with a simple delete operation. . . . . 39
2-4 The amount of ransom money among common ransomware families. 46
2-5 The number of Bitcoins per address. . . . . . . . . . . . . . . . . . . 47
2-6 The total number of transactions per Bitcoin address. . . . . . . . . . 48
2-7 The duration of activity for Bitcoin addresses. . . . . . . . . . . . . . 49
2-8 The duration of activity for Bitcoin addresses. . . . . . . . . . . . . . 50
3-1 Overview of the design of I/O access monitor in Unveil. . . . . . . . . . 64
3-2 Di�erent attack strategies among ransomware families with respect to I/O
access patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3-3 Precision-recall analysis of the tool. . . . . . . . . . . . . . . . . . . . . 79
3-4 Evolution of VT scanner reports after six submissions. . . . . . . . . . . . 85
3-5 I/O activities of a previously unknown ransomware family detected by Unveil. 86
13
4-1 Redemption mediates the access to the �le system and redirects each write
request on the user �les to a protected area without changing the status of
the original �le. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4-2 The steps involved in commiting the benign changes to the �les. . . . . . . 102
4-3 TP/FP analysis of Redemption based on the best threshold value. . . . . 114
14
List of Tables
2.1 The list of malware families used in our experiments. . . . . . . . . . 29
2.2 The IRP requests generated during Cryptowall attack. . . . . . . . . 33
2.3 The types of IRPs requested by a malicious process to encrypt and
overwrite the victim's �les during a ransomware attack. . . . . . . . . 34
2.4 A set of IRP requests generated on behalf of a malicious process to
delete �les during an attack. . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Summary of types of charges in 15 ransomware families. . . . . . . . . 45
3.1 The list of ransomware families used in the �rst experiment. . . . . . . . . 76
3.2 The list of benign applications that generate similar I/O access patterns
to ransomware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.3 An example of I/O access in Unveil for CryptoWall 3.0 and CryptoWall 4.0. 77
3.4 I/O accesses for deletion and compression mechanisms in benign/malicious
applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.5 Unveil detection results. . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.1 A list of Benign application and their malice scores. . . . . . . . . . . . . 116
4.2 A list of ransomware families and their malice scores. . . . . . . . . . . . 117
15
4.3 The ransomware families used to test Redemption and other proposed
techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.4 Disk I/O performance in a stan- dard and a Redemption-protected host. 120
4.5 Runtime overhead of Redemption on a set of end-point applications . . . 120
16
Chapter 1
Introduction
1.1 Motivation for Novel Defenses
Malware attacks continue to remain one of the most popular attack vectors in the
wild [101, 73]. Among all classes of malware, ransomware has recently become very
popular among malware authors [12, 29, 34, 44]. Ransomware is a kind of scareware
that locks the victims' computers until they make a payment to re-gain access to their
data. In fact, this class of malware is not a new concept (such attacks have been in
the wild since the last decade), but the growing number of high-pro�le ransomware
attacks has resulted in increasing concerns on how to defend against this class of
malware.
In 2016, several public and private sectors, including the healthcare industry, were
impacted by ransomware [22, 13, 107]. Lately, US o�cials have also expressed their
concerns about ransomware [38, 49], and even asked the U.S. government to focus
on �ghting ransomware under the Cybersecurity National Action Plan [49]. Very
recently, WannaCry ransomware, the most recent successful ransomware attack, im-
17
pacted thousands of users around the world by exploiting the EternalBlue vulnerabil-
ity, encrypting user data, and demanding bitcoin payments in exchange for unlocking
�les [75].
In response to the increasing number of ransomware attacks, users are often ad-
vised to create backups of their critical data. Certainly, having a reliable data backup
policy minimizes the potential costs of being infected with ransomware, and is an im-
portant part of the IT management process. However, the growing number of paying
victims [15, 81, 40] suggests that technically unsophisticated users � who are the main
target of these attacks � do not follow these recommendations, and easily become a
paying victim of ransomware. Hence, ransomware authors continue to create new at-
tacks and evolve their creations as evidenced by the emergence of more sophisticated
ransomware every day [103, 11, 101, 86].
Unfortunately, many of the recent security reports about ransomware [33, 52,
101, 73] mainly focus on the advancements in ransomware attacks and their levels
of sophistication, rather than providing some insights about e�ective defense tech-
niques that should be adopted against this threat. Furthermore, the current defense
mechanisms to detect, analyze, and defend against ransomware are not very di�erent
from the ones that are used to detect other types of evasive malware. Perhaps, the
main assumption here is that this class of malware employs all possible evasion tech-
niques, similar to other classes of malware, to bypass detection tools, reach end-users,
and successfully launch attacks. While we agree that this is a valid assumption, we
claim that these mechanisms cannot lead to the best defense mechanisms against
ransomware, as evidenced by the increasing number of very successful ransomware
attacks in the wild.
18
1.2 Limitations of Current Defense Mechanisms
Ransomware attacks share some similarities with other types of malware particu-
larly in utilizing evasion techniques, payload distribution and infection mechanisms.
For example, similar to other types of malware attacks, opening email attachments
or clicking on malicious advertisements can increase the risk of being infected with
ransomware. Consequently, some of the current techniques that are used to identify
suspicious links or payloads in email attachments are still useful in detecting malicious
binaries that deliver ransomware.
Furthermore, like other classes of malware, most of current ransomware samples
need to communicate with Command and Control (C&C) servers to receive commands
and eventually run attacks on infected machines (e.g., requesting the encryption key
from the remote server). Consequently, the techniques to analyze DGA-based do-
mains or identify malicious network tra�c [16, 82, 90, 18, 17] can be incorporated
in detecting some types of ransomware attacks. Similarly, some of the static anal-
ysis techniques such as Portable Executable (PE) analysis tools or packer detection
techniques can still provide helpful reports about the corresponding malicious binary.
However, current tools cannot provide useful information about the speci�c behavior
of a given ransomware sample to security analysts, which ultimately results in in-
creasing number of unknown attacks on end-user machines as evidenced by a set of
very successful ransomware attacks.
Unlike most of modern malware attacks, ransomware attacks are not usually de-
signed to be stealthy after the infection phase, as the whole point of the attack is to
notify victims that their machines are infected. Furthermore, the core functionality
of a ransomware sample � i.e., the cryptosystem � which results in data destruction
19
works very similar to a class of benign applications that are often used for archiving
or privacy preserving purposes. In fact, the similarity of ransomware functionality
with a set of benign applications as well as the di�erences with other types of malware
attacks in the attack strategy have made the current automated analysis techniques
less e�ective in detecting such attacks and protecting end-users. Therefore, it is quite
useful to develop tools that can accurately extract ransomware behaviors, in light of
these similarities and deferences, and improve current automated analysis systems to
detect these class of malware more e�ectively.
1.3 Overview of the Dissertation
In this thesis, we investigate the feasibility of developing solutions to detect and an-
alyze ransomware attacks. In fact, the thesis of this dissertation is that, unlike other
malware, the nature of ransomware attacks is not very broad, and protecting against
a large number of ransomware attacks is possible. We argue that ransomware attacks
follow very similar patterns in order to be successful and force victims to pay the
ransom fee. For example, unlike other classes of malware that aims to be stealthy to
collect banking credentials or keystrokes without raising suspicion, ransomware noti-
�es victims that they are infected. Moreover, a successful ransomware usually needs
to prevent user's access to his own data by performing encryption and/or deletion
operations, and repeating these destructive actions during an attack. This thesis aims
to show that if we use these insights in the defense side, and accurately model these
behaviors, we can reliably detect a signi�cant number of ransomware attacks in the
wild.
In the �rst part of this thesis, we perform an evolutionary-based analysis on ran-
20
somware attacks to understand the main characteristics of these attacks [56]. This
work is motivated by our need to study the core functionalities of these attacks from a
�lesystem perspective. To this end, we created a dataset of ransomware samples that
covers the majority of the existing ransomware families which have been observed in
the wild. We design and implement a kernel level module to closely monitor the inter-
action of user mode processes with the �lesystem. Our analysis shows that di�erent
classes of ransomware attacks with multiple levels of sophistication share very similar
characteristics from a �lesystem perspective due to the nature of these attacks.
In the second part of this thesis, we present a novel dynamic analysis system,
called Unveil [53], that is designed to analyze ransomware attacks and model their
behaviors. In our approach, the system automatically creates an arti�cial, realistic ex-
ecution environment and monitors how ransomware interacts with that environment.
We evaluate Unveil using more than 148,000 distinct samples belonging to di�erent
malware families. The evaluation of Unveil shows that our approach was able to
correctly detect 13,637 ransomware samples from multiple ransomware families in a
real-world data feed with zero false positives. Our analysis shows that Unveil can
signi�cantly enhance the current anti-malware solutions with regard to ransomware.
In the third part of the thesis, we investigate the possibility of protecting user data
from ransomware attacks at end-hosts with zero data loss. To this end, we propose
a general framework, called Redemption [54], to augment the operating system
with ransomware protection capabilities. Redemption does not require performing
any signi�cant changes in the semantics of the underlying �lesystem functionality, or
modifying the architecture of the operating systems.
This thesis consists of the following sections:
21
In Section 2, we provide an overview of current ransomware attacks and the techniques
they use. In Section 3. we describe a dynamic analysis system that is speci�cally
designed to detect and analyze ransomware samples. Section 4 describes an end-point
solution to protect the consistent state of user data during a ransomware attack.
Finally, Section 5 concludes the thesis.
22
Chapter 2
An Analysis on Current Ransomware
Attacks
2.1 Introduction
Over the past few years, a class of malware known as scareware has become popular
among cybercriminals. This malware takes advantage of people's fear of revealing
their private information, losing their critical data, or facing irreversible hardware
damage. In particular, this work focuses on ransomware, a particular class of scare-
ware that locks the victims' computers until they make a payment to re-gain access
to their data.
Although the �rst version of ransomware appeared in the wild almost 10 years
ago, the volume of ransomware incidents was not signi�cant until a couple of years
ago. As number of ransomware attacks increased over 500% on 2013 compared to
the previous years, the ransomware threat made the headlines as the most notable
malware trend after targeted attacks in 2013 [101]. For example, the Cryptolocker
23
ransomware alone managed to infect approximately 250 thousand computers around
the world, including an entire police department that needed to pay a ransom to
decrypt their documents [39, 88].
Given the signi�cant growth of ransomware attacks [101], it is very important
to develop a protection technique against this type of malware. However, designing
e�ective defense mechanisms is not practically possible without having an insightful
understanding of these attacks. Currently, many of the recent security reports about
ransomware [33, 39] rely on ad-hoc procedures rather than a scienti�c assessment.
Moreover, these reports mainly focus on the advancements in ransomware attacks
and their levels of sophistication, rather than providing some insights about e�ective
defense techniques that should be adopted against this threat. In this work, we
investigate the key functionalities of ransomware samples such that we can propose
e�ective detection mechanisms leveraging our �ndings.
We created a collection of ransomware samples that were categorized in 15 di�erent
families. Our data set covers the majority of the existing ransomware families that
have been observed in the wild between 2006 and 2014. The data set is created
using multiple sources including manual and automatic crawling of public malware
repositories, and the ransomware samples submitted to Anubis [21] since 2011. The
results of our analysis con�rm the folk wisdom that such attacks have a continuous
increase in the number of families and distinct samples per year [77, 101] and also the
advances in certain aspects of the speci�c functionalities of few ransomware families.
However, our results also reveal that in a signi�cant number of samples, the core parts
of ransomware samples lack the technical complexity to perform successful attacks.
While a small fraction of the samples can really prevent the victims from accessing the
24
resources and cause severe problems, a signi�cant number of samples fail to seriously
take the victims' resources as hostage. More speci�cally, we show that more than
94% of ransomware samples in our data set simply try to lock the victims' computer
desktop and request ransom, or use very similar and super�cial approaches to target
the victims' resources.
We also performed an analysis of the charging methods adopted by di�erent ran-
somware families and also traced the transactions of 1,872 Bitcoin addresses that were
used during the Cryptolocker attack. The analysis of the transactions shows that
cybercriminals started to adopt evasive techniques (e.g., using new addresses for each
infection to keep the balances low) in order to better conceal the criminal activity
of the Bitcoin accounts. Our analysis also con�rms that the Bitcoin addresses used
for malicious intents share similar transaction records (e.g., short activity period,
small Bitcoin amounts, small number of transactions). However, determining mali-
cious addresses in the Bitcoin network based on the transaction history is signi�cantly
di�cult, in particular when cybercriminals use multiple independent addresses with
small amount of Bitcoins.
In addition to our long-term study, we also evaluate the feasibility of implementing
defense mechanisms against destructive ransomware attacks. We provide an analy-
sis of the �le system activity of ransomware samples that target users' �les. Our
analysis shows that di�erent classes of ransomware attacks with multiple levels of
sophistication share very similar characteristics from a �le system perspective, due
to the nature of these attacks. Our analysis suggests that when an infected system
is under attack, one can notice a signi�cant change in the �le system activity since
the malicious process generates a large number of similar �le system access requests.
25
Consequently, if we e�ectively monitor the �le system activity (e.g., the changes in
Master File Table (MFT) and the types of I/O Request Packets (IRP) to the �le
system), it is possible to detect multiple di�erent types of destructive ransomware
attacks that target users' �les. This contradicts recent discussions in the security
community about the impossibility of detecting or stopping these types of attacks
due to the use of sophisticated destructive techniques [6, 77, 96, 101]. Based on our
analysis, we conclude that detecting and stopping a large number of destructive ran-
somware attacks is not as complex as it has been reported and deploying practical
defense mechanisms against these attacks is possible due to the engineering of NTFS
�le system.
In summary, the contributions of this work are as follows:
� We analyzed 1,359 ransomware samples, describing previously undocumented
aspects of ransomware attacks with the focus on distinctive and common be-
haviors among di�erent families.
� We explain how the core parts of ransomware samples are engineered and how
these �ndings can potentially be used to detect these attacks. Our analysis
shows that the abnormal �le system activity can be accurately monitored in
destructive ransomware attacks with di�erent levels of sophistication.
� We perform an analysis of charging methods adopted among ransomware fami-
lies and also investigate how cybercriminals used cryptocurrency in recent ran-
somware attacks. Our analysis of illicitly-gained Bitcoins suggests that cyber-
criminals adopted multiple evasive techniques to protect their privacy in Bitcoin
network, making the tracing procedure signi�cantly more di�cult.
26
� We suggest avenues that can be used to defend against a large number of de-
structive zero-day ransomware attacks. We propose a general methodology to
detect these attacks without making any assumptions on how they attack the
users' �les.
The rest of the chapter is structured as follows. In Section 2.2, we present our data
set and ransomware families we categorized. In Section 2.3, we present experiments we
conducted and discuss our �ndings. In Section 2.4, we discuss the �nancial incentives
and payment methods. In Section 2.5, we brie�y present related work. Finally, we
conclude the chapter in Section 2.6.
2.2 Ransomware Data Set
Since collecting the malware data set was a critical part of our research, in this
section, we provide some details about our ransomware sample selection procedure.
To achieve a comprehensive ransomware data set, we collected malware samples from
multiple sources. While we obtained 37.9% of our samples from Anubis, 48.38% were
collected by automatically crawling public malware repositories [4, 2, 1]. We captured
the remainder 13.8% by manually browsing through security forums [71, 3].
We collected 3,921 ransomware samples from all those sources. However, after
removing the samples that did not execute properly in our environment and those for
which we were not able to �nd a release date, our data set contained a total of 1,359
active ransomware samples. To obtain accurate labels for these samples, we cross-
checked the malware samples by automatically submitting the list of MD5 hashes to
VirusTotal. To be conservative on our ransomware malware selection, we consider a
27
malware to be ransomware if at least three AV engines recognized it as belonging to
this category.
To obtain the family names, we parsed the naming schemes of the AV vendors that
are commonly used to assign malware labels. In 77% of samples, AV engines followed
the same labeling scheme and our naming policy was mainly based on the popularity
of the family name in the community (e.g., Gpcode, Reveton). The remaining 23% of
the samples were labeled in an inconsistent way among the di�erent antivirus software,
and in this case we simply selected the most common label among the list of the top
39 AV engines. For example, some samples were labeled both as Pornoasset and as
Tobfy by top AV engines, but we labeled these samples as Tobfy due to the perceived
popularity of the label.
To the best of our knowledge, our analysis covers the majority of the existing
ransomware families observed between 2006 to 2014. However, as our data collection
module relies on external sources, we are aware of the possibility of missing some
types of ransomware attacks. Furthermore, in order to conduct balanced experiments
over the ransomware families, and also to avoid biased results due to polymorphic
techniques, we performed our analysis not only based on individual samples, but also
based on the families and distinct variants per family. Table 2.1 shows the total
number of distinct samples per family as well as distinct variants in each family. It
also shows the �rst time they appeared in the wild and the most recent sample in our
data set.
As it can be clearly seen from Table 2.1, there is a rapid emergence of new families
between 2012 and 2014, as well as a signi�cant growth on the number of new samples
in each family. This may be due to a bias on the data set toward more recent
28
Table 2.1: The list of malware families used in our experiments.
Family Family Description Types of AttacksSamples Variants First Seen Most Recent Encypting Files Changing MBR Deleting Files Stealing Info
Reveton 244(17.95%) 14 2012 2014 ✓ ✓Cryptolocker 32 (2.35%) 4 2013 2014 ✓ ✓CryptoWall 11(0.8) 2 2014 2014 ✓Tobfy 122 (8.97%) 12 2010 2014 ✓Seftad 23 (1.69%) 4 2006 2010 ✓Winlock 308(22.66%) 27 2008 2013 ✓Loktrom 4 (0.29%) 2 2012 2013Calelk 9 (0.663%) 2 2009 2010Urausy 523 (38.48%) 16 2009 2014 ✓ ✓Krotten 17 (1.25%) 3 2008 2009BlueScreen 4 (0.29%) 1 2008 2009Kovter 8 (0.58%) 2 2013 2013 ✓Filecoder 9 (0.66%) 3 2012 2014 ✓ ✓GPcode 21 (1.54%) 4 2004 2008 ✓Weelsof 24 (1.76%) 3 2012 2013
No. of Samples 1,359 - - - 73(5.37%) 23(1.69%) 484(35.61%) 44(3.23%)No. of Variants - 99 - - 13(13.13%) 4(4.04%) 29(21.33%) 6(6.06%)
samples, or to the multiplication of samples due to polymorphism in newer families.
(e.g., Winlock, Urausy, and Reveton). The Table also shows the types of ransomware
attacks we observed among each family in our data set (in addition to locking the user
desktop). In particular, we observed that 61.22% of the samples (57 variants) only
targeted the desktop of compromised computers, without touching the documents in
the �le system. More details on the locking procedure are discussed in Section 2.3.1.
Encrypting the victim �les in addition to locking the desktop was observed in 5.37%
of samples in four families (Cryptolocker, CryptoWall, Filecoder, and Gpcode).
We also observed the emergence of other malicious activities, such as changing the
browser setting or performing multiple infections to install other malware, in 3.23%
of the samples. Despite the fact that the number of samples performing additional
malicious activities (e.g., stealing private information) is not alarmingly high, this
phenomenon is now increasing. For example, our analysis shows that information
stealing was �rst seen in Reveton in early 2012, but other families such as Kevtor,
Urausy, and Cryptolocker started to add stealing information capabilities to their
samples after that date [41, 63]. We provide more details on the malicious behaviors
among ransomware families in Section 2.3.
29
2.2.1 Experimental Setup
We performed all malware execution experiments according to common scienti�c
guidelines [94] inside a Cuckoo Sandbox [37] running Windows XP SP3 32bit, with a
controlled access to the Internet via NAT. Network tra�c (e.g., IRC, DNS and HTTP)
were allowed to enable commands and controls (C&C) communication. In order to
control harmful tra�c (e.g., spam) during the execution of the experiments, we redi-
rected this tra�c to a local honeypot. The network bandwidth was also reduced to
mitigate potential DoS attacks.
The environment installed inside the malware analysis system includes typical data
in an user session such as saved credentials, browser history, and other customizations.
We also emulated some basic user activity by running an script in each malware run
(e.g., opening a window, moving the mouse, opening a website). We then executed
each sample in the analysis environment for 45 minutes to capture the execution
traces of the sample. Since current ransomware samples typically start attacking the
user's �les right after the malicious program is executed by the user, we believe that
the 45 minutes threshold is su�cient for most ransomware samples to exhibit their
malicious behavior. After each execution, the entire system is rolled back to a clean
state to prevent any interference across executions.
2.3 Characterization and Evolution
In this section, we describe our �ndings based on the types of malicious activities
detected in ransomware samples during our experiments. We partition the malicious
activities into multiple categories and discuss our �ndings in each of them.
30
2.3.1 File System Activity
One of our �rst goals was to describe how a malicious process interacts with the �le
system when a compromised computer is under a ransomware attack. To answer this
question, we investigate the common characteristics of ransomware attacks from a
�le system perspective regardless of the technical di�erences that these attacks might
have (such as the infection and the key generation techniques). In order to monitor
the �le system activity, multiple approaches could be used. One classic approach is
to hook the SSDT table [46, 66] to monitor interesting function calls. In our analysis,
we developed a mini�lter driver [78] to capture all I/O requests that the I/O manager
generates on behalf of user-mode processes to access the �le system.
To monitor the I/O requests the mini�lter driver registers callback routines to the
�lter manager. In our analysis, we de�ned pre-operation and post-operation callback
routines for all IRP functions in order to precisely record any I/O and transaction
activity on the �les. For each �le system request, we collected the process name, the
process ID, the parent process ID, the pre-operation and post-operation callback time,
the IRP type, the arguments and the result of the operation. Each record is a tuple:
<PName,PID,PPID,PreOpTime,PostOpTime,IRPFlag,Args,Result>
The mini�lter with di�erent callback routines allows us to capture all the the read,
write, and attribute change requests to the �le system at the closest possible level to
the �le system driver. Our mini�lter driver is deployed in a privileged kernel mode
that has access to nearly all objects of the operating system. Furthermore, since
we captured the �le system activity directly from the I/O manager in the kernel,
there was a low chance that cybercriminals could bypass our monitor. When looking
at the execution traces of the malware program in the analysis environment, we
31
observed that the way malicious processes generate requests to access �le system was
signi�cantly di�erent from benign processes. By performing a close examination of
the �le system activity of multiple ransomware samples, we were able to distinguish
multiple attack strategies that ransomware families used while the system was under
the attack. We discuss our �ndings in the following sections.
Encryption Mechanisms
As presented in Table 2.1, 5.37% of the samples among four families employed some
encryption mechanisms during the experiments. Our analysis shows that existing
ransomware samples use both customized and standard cryptosystems during the at-
tacks. The customized cryptosystems are not necessarily more reliable or complicated
than the standard cryptosystems that Windows platforms provide (e.g., CryptoAPI).
Cybercriminals develop their own cryptosystems for multiple reasons. One reason is
probably to decrease the chance of being easily detected by common malware analysis
techniques (e.g., PE header checking, Hooking standard API functions). One of the
key features crypto-style ransomware samples should have is to reliably minimize the
chance of recovering the original data after generating the encrypted �les. Some of
the modern crypto-style ransomware families such as cryptolocker and CryptoWall
make use of standard Windows functions to perform their �le encryption. They sim-
ply call CryptEncrypt with an handle to the encryption key and a pointer to a bu�er
that contains the plaintext to be encrypted. In these families, the plaintext in the
bu�er is directly overwritten with the encrypted data created by this function. As
depicted in Table 2.2, the I/O manager generates IRP_MJ_CREATE on behalf of the
malicious process to open the user's �le. The �le content is read via IRP_MJ_READ
32
Table 2.2: The IRP requests generated during Cryptowall attack.
Process Name Operation Path Result
mal.exe IRP_MJ_CREATE E:\MySubmissions SUCCESS
mal.exe IRP_MJ_DIRECTORY_CONTROL E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_CLEANUP E:\MySubmissions\ SUCCESS
mal.exe IRP_MJ_CLOSE E:\MySubmissions\ SUCCESS
mal.exe IRP_MJ_CREATE E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_READ E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_WRITE E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_READ E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_WRITE E:\MySubmissions\dimva2015-submission.tex SUCCESS
.
.
.
mal.exe IRP_MJ_CREATE E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_SET_INFORMATION E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_CLOSE E:\MySubmissions\dimva2015-submission.tex SUCCESS
for encryption and is overwritten with the ciphertext bu�er using the IRP_MJ_WRITE
function each time a �le encryption occurs.
We also observed that even if the samples do not use standard cryptosystems,
it is still possible to recognize how they attack users' �les. For instance, a member
of the Filecoder family uses a simple customized approach to encrypt �les. Unlike
Cryptolocker and CryptoWall, the sample �rst generates an encrypted version of a
�le using an AES-256 encryption key and then overwrites the original �le's data with
the encrypted �le. Table 2.3 shows how the malicious process interacts with the �le
system to encrypt an arbitrary �le when the system is under the attack. The types of
IRPs generated when the malicious process operates show how a ransomware sample
targets the victim's �les. For example, the sequence of IRPs shows that the ran-
somware sample �rst queries the given location to �nd the user's �le and creates han-
dles to the original and encrypted �les. The �le's data is read via a IRP_MJ_READ IRP
and the encrypted data bu�er is written to the destination �le via a IRP_MJ_WRITE
IRP. Consequently, IRP_MJ_SET_INFORMATION is used to delete the original �le after
the �le is closed and also to overwrite the original �le with the encrypted �le. The
33
Table 2.3: The types of IRPs requested by a malicious process to encrypt and over-write the victim's �les during a ransomware attack.
Process Name Operation Path Result
mal.exe IRP_MJ_CREATE E:\MySubmissions SUCCESS
mal.exe IRP_MJ_DIRECTORY_CONTROL E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_CLEANUP E:\MySubmissions\ SUCCESS
mal.exe IRP_MJ_CLOSE E:\MySubmissions\ SUCCESS
mal.exe IRP_MJ_CREATE E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_CREATE E:\MySubmissions\dimva2015-submission.tex.crypt SUCCESS
mal.exe IRP_MJ_READ E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_READ E:\MySubmissions\dimva2015-submission.tex SUCCESS
.
.
mal.exe IRP_MJ_WRITE E:\MySubmissions\dimva2015-submission.tex.crypt SUCCESS
.
.
mal.exe IRP_MJ_CLEANUP E:\MySubmissions\dimva2015-submission.tex.crypt SUCCESS
mal.exe IRP_MJ_CLEANUP E:\MySubmissions\dimva2015-submission.tex SUCCESS
.
.
mal.exe IRP_MJ_CREATE E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_SET_INFORMATION E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_CLEANUP E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_CLOSE E:\MySubmissions\dimva2015-submission.tex SUCCESS
.
.
mal.exe IRP_MJ_CREATE E:\MySubmissions\dimva2015-submission.tex.crypt SUCCESS
mal.exe IRP_MJ_SET_INFORMATION E:\MySubmissions\dimva2015-submission.tex.crypt SUCCESS
mal.exe IRP_MJ_CLOSE E:\MySubmissions\dimva2015-submission.tex.crypt SUCCESS
sequence of IRPs shown in Figure 2.3 is repeated for every �le on the infected system.
Another sample from Filecoder makes use of the Defragmentation API to get
raw access to each �le's data based on the volume sector and the cluster size. The
sample overwrites the �les with custom data patterns based on how the �les are
kept on the disk. For example, if the �le mapping check shows that the �le has
multiple extents, the physical disk o�sets of each extent should be retrieved to be
overwritten with the custom data pattern. If the �le does not have any extents, it
means that the �le is small and is kept as a MFT entry in the MFT table. The malware
uses the DeviceIoControl from kernel32.dll to get the �le map on the physical
disk. Figure 2-1 shows how a malicious process �nds the �le's data and overwrites the
data after the encryption. When NTFS �nds the �le record for the MFT, it obtains the
VCN-to-LCN mapping information in the �le records data attribute. Consequently,
the malicious process can easily retrieve the information and locate the �le's data on
the disk.
34
$MFTMrr
$MFT
Dimva2015.tex
.
.
.
.
.
.
STANDARD INFORMATION
FILE NAME STARTINGVCN
STARTINGLCN
No. ofClusters
0 1742 4
4 1794 4
VCN 0 1 2 3 4 5 6 7
LNC 1742 1743 1744 1745 1794 1795 1796 1797
Infected Process
.
.
File Object
Handle Table
SetFilePointerEx to the beginning of the file
Overwriting the original file’s data with custom data pattern after encryption
Figure 2-1: The malicious process attempts to get the �le map on the physical diskto overwrite the �le's data after the encryption.
Encryption techniques (e.g., key generation and key management) in crypto-style
ransomware families have also evolved signi�cantly. For example, a Gpcode variant
generates a static key during the attack. This key is also used to encrypt all the non-
system �les. Finding the encryption key in this variant is fairly simple and we were
able to retrieve the key by comparing the encrypted �le and the original one. The
most recent Gpcode variant in our data set encrypts the �les using a unique AES-256
encryption key. The encryption key is then encrypted using a 1024-bit RSA public
key. Another change we observed over time is the place where an asymmetric key pair
is generated. For example, in a sample (md5:ffcf2bb69f23c7c234d2f2ee380cdaa4)
created in 2012, the master key is generated locally in the compromised computer
35
Table 2.4: A set of IRP requests generated on behalf of a malicious process to delete�les during an attack.
Process Name Operation Path Result
mal.exe IRP_MJ_DIRECTORY_CONTROL E:\* SUCCESS
mal.exe IRP_MJ_CLEANUP E:\ SUCCESS
mal.exe IRP_MJ_CLOSE E:\ SUCCESS
mal.exe IRP_MJ_CREATE E:\ SUCCESS
mal.exe IRP_MJ_DIRECTORY_CONTROL E:\MySubmissions\* SUCCESS
mal.exe IRP_MJ_CLEANUP E:\MySubmissions\ SUCCESS
mal.exe IRP_MJ_CLOSE E:\MySubmissions\ SUCCESS
mal.exe IRP_MJ_CREATE E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_DIRECTORY_CONTROL E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_SET_INFORMATION E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_CLEANUP E:\MySubmissions\dimva2015-submission.tex SUCCESS
mal.exe IRP_MJ_CLOSE E:\MySubmissions\dimva2015-submission.tex SUCCESS
and can be extracted by looking into the memory. The use of RSA keys with dif-
ferent key length in Cryptolocker was previously reported [88], but at the time of
writing, we observed only samples with 1024-bit RSA public key in our data sets.
The RSA public key is generated remotely on the C&C server once the compro-
mised computer successfully sends a POST request to C&C servers. If the sample
cannot connect to C&C servers, the malicious behavior is not triggered. The sam-
ple md5:04fb36199787f2e3e2135611a38321eb only encrypted users' �les in logical
drives introduced in the system. An evolution in this family is the encryption of
connected drives. The sample (md5:f1e2de2a9135138ef5b15093612dd813) encrypts
all non-system �les including network shares to minimize the possibility of recov-
ering �les without paying the ransom. These ransomware samples simply employ
GetLogicalDrives, GetDriveType or similar functions to �nd network drives.
Deletion Mechanisms
In this part, we speci�cally discuss �le deletion mechanisms that are unique to ran-
somware attacks. 35.6% of samples among �ve common ransomware families do not
perform any encryption mechanisms. Instead, they delete the user's �les if the user
does not pay the ransom. On the other hand, we observed that certain samples
36
Figure 2-2: Disk layout for �les with di�erent sizes in NTFS �le system.
in Gpcode and Filecoder families deleted the original unencrypted �le's data after
the encryption occurred. Consequently, deletion operation is a common task among
multiple ransomware families in our data set. Table 2.4 shows a sequence of IRPs
collected while running a sample from the Filecoder family. The malicious pro-
cess uses the IRP_MJ_DIRECTORY_CONTROL function to list the �les and then requests
to open the �le via a Win32 CreateFile. Any create requests are performed by
IRP_MJ_CREATE function which returns a handle to the �le objects. Finally, the �le
is deleted by IRP_MJ_SET_INFORMATION when the �le is closed. We observed very
37
similar approaches in other families such as Gpcode, Reveton and Urausy in spite of
di�erences in other aspects of the attacks.
In the NTFS �le system, each �le has an entry in the Master File Table (MFT) that
re�ects the changes of the corresponding �le or folder [26]. The core �le's attributes
in each MFT entry can be found in the $STANDARD_INFORMATION attribute, and the
$DATA attribute that contains the content of the corresponding �le. The content of
the $DATA attribute could be resident or non-resident in the MFT entry depending on
the size of a �le. Figure 2-2 shows the disk layout for �les with di�erent sizes in the
NTFS �le system. The status of a �le is determined by both a �ag and a $BITMAP in
an MFT entry. $BITMAP manages the information about allocation status of clusters
within the disk.
When a ransomware attack occurs, the malware lists the non-system �les and
initiates a delete operation for each of them. The MFT entry for each �le is updated
by changing the status �ag value of the �le from 0x01 to 0x00. Furthermore, the
$BITMAP attribute in MFT �le is set to zero for the corresponding �le. For large �les,
since multiple clusters might be allocated, the location of fragmented data is saved in
the runlist in the header of MFT entry. When the �le is deleted, the clusters that
are used to keep the �le's data are set to unallocated in $BITMAP attribute in the MFT
�le. Consequently, when a �le is deleted in a typical ransomware attack, the MFT
entry is updated, but the content of the �le is not deleted immediately. Therefore,
our analysis suggests that we can detect ransomware attacks that target users' �les
based on the changes in the MFT table and also recover the content associated with
the deleted �les due to the engineering of the NTFS �le system. Finally, Figure 2-3
shows the delete operation from a di�erent perspective- when the malicious process
38
Figure 2-3: A ransomware attack (Gpcode) with a simple delete operation.
tries to delete a large �le that is fragmented among multiple clusters.
Changing Master Boot Records One of the ransomware families (Seftad) was
developed to attack the Master Boot Records (MBR) which contains the executable
boot code and the partition table. The MBR is located on the �rst sector of a hard
disk, and it is loaded into memory at boot time when the system transfer control
to the code stored in the MBR. Samples that target the MBR prevent the infected
system from loading the boot code in the active partition by simply replacing it with
a bogus MBR that displays a message asking for a ransom. Defeating this type of
39
ransomware attack is quite simple. For example, in early samples, the unlock code was
hard-coded into the binary and could be acquired by reverse engineering. Following
this procedure, we discovered the unlock code in 18 Seftad samples in our data set.
Locking Procedure
An important step in a successful ransomware attack is to lock the desktop of the
computer under attack. This is typically done by creating a new desktop and making
it persistent. Ransomware samples simply use CreateDesktop to create a fresh desk-
top environment and eliminate unnecessary processes. The new desktop is created via
a DESKTOP_SWITCHDESKTOP access mode that enables the SwitchDesktop function to
activate the new desktop and receive input from the victim. The desktop is assigned
to a thread using the SetThreadDesktop function. A signi�cant number of samples
in our data set (61.22%) use very similar approaches to establish a persistent desktop
lock.
A small number of samples (8 variants) in families like Urausy, Reveton, and
Winlock employed another approach to lock the desktop. In these families, the lock
banner is simply downloaded as a HTML page with corresponding images based
on the victim's geographical location and it is then displayed in full screen in a IE
window with hidden controls. The banner plays a local law enforcement warning
in the language used in the victim's geographical location. The warning typically
says that the operating system is locked due to infringement against certain laws
(e.g., distributing copyrighted materials or visiting child pornography sites) in that
location.
Disabling certain keyboard shortcuts such as toggling (e.g., Windows key + Tab)
40
is automatically done once a new desktop is created because no other applications
are open to toggle through. However, disabling special keys is another part of the
locking procedure. This is done by installing hook procedures that monitor keyboard
input events. The number of disabled keys was di�erent in di�erent ransomware
families. For example, 18 variants in Reveton and Urausy disabled Windows keys
to prevent the victims from entering the start menu and 72 variants among 15 fam-
ilies attempted to disable the Esc Key to prevent the victims from using keyboard
shortcuts (e.g., starting Windows Task Manager) during the attack.
2.3.2 Mitigation Strategies
API Call Monitoring
As discussed in Section 2.3.1, a signi�cant number of ransomware samples use Win-
dows API functions to lock the victim's desktop. Those API calls can be used to
model the application behavior and train a classi�er to detect suspicious sequence of
Windows API calls. This approach is not necessarily novel, but it would allow us
to stop a large number of ransomware attacks that are produced with little techni-
cal e�orts. For example, a sequence of GetThreadDesktop, CreateDesktopW and
SwitchDesktop functions can be converted to a sequence of API calls. Of course, cy-
bercriminals might be able to evade detection using di�erent techniques. For example,
they may use native APIs to directly lock the system under the attack. However, the
implementation of such ransomware samples requires signi�cant work since the native
APIs are not properly documented and may change among di�erent versions, which
can limit the portability of the attack.
41
Monitoring File System Activity
Our analysis also suggests that it is possible to detect ransomware attacks � even the
ones using deletion and encryption capabilities � based on our �ndings in Section 2.3.1.
Our analysis shows that signi�cant changes occur in the �le system activities (e.g., a
large number of similar encryption, deletion requests) when the system is under a
ransomware attacks. By closely monitor the MFT table, one can detect the creation,
encryption or deletion of �les. For example, when the system is under a ransomware
attack, a signi�cant number of status changes occur in a very short period of time in
MFT entries of the deleted �les. For encrypted �les, we notice a large number of MFT
entries with encrypted content in the $DATA attribute of �les that do not share the
same path (e.g., �les within a directory). In our de�nition, a malicious MFT entry is
a MFT entry that is generated or modi�ed in a system under a ransomware attack. A
classi�er can be trained on benign and malicious MFT entries to detect abnormal �le
system activities when the system is under an attack.
In order to distinguish between benign and malicious �le system activity, another
possible approach consists of monitoring all the �le system requests that user-mode
processes generate. A system with protection capabilities can intercept all the re-
quests and discard the suspicious requests before they reach the �le system driver.
Recovering the deleted �les from the ransomware attacks would also be possible.
If the $DATA attribute is resident in the MFT entry, the content of the �le can be
simply copied to another location. For non-resident $DATA attributes, we need to
parse the RunList in the MFT entry and copy the raw data to another location and
perform the recovery. In any case, early detection of the attack is critical in order
to successfully recover the content of deleted �les, since the deallocated clusters can
42
be allocated to new �les and the content of the deleted �le will be overwritten. This
approach can be applied to most of the ransomware samples with either customized
or standard cryptosystems since the �le level activity is a common characteristic of
ransomware samples that target users' �les.
Using Decoy Resources
The attack strategies adopted to encrypt or delete the user �les are very similar
among ransomware families. For example, the malicious process aggressively attacks
all �les (in di�erent paths, and with di�erent extensions) and tries to encrypt and/or
delete them in a very short period of time. Therefore, de�ning a �le system activity
model that re�ects the normal interaction with the �le system is possible. However,
cybercriminals could try to evade detection by launching attacks while mimicking a
normal user behavior. For example, a cybercriminal may avoid aggressively encrypt-
ing all �les and starts by encrypting �les with recent access or modi�cation time.
Approaches like this might not be detected by approaches that monitor the behavior
of the system. However, one technique to detect these attacks could be to install
decoy �les in multiple locations of the disk that are constantly monitored. The use
of decoy resources to detect security breaches and insider attacks was �rst proposed
in [24, 112]. Decoy resources have also been recently used to improve the security
of hashed passwords [50] and to detect illegally obtained data from �le hosting ser-
vices [83].
In our de�nition, monitoring decoy �les can be an additional layer of defense on
the top of �le system activity monitoring to detect ransomware attacks. The decoy
�les should be indexed at multiple places in the user environment and should be
43
generated in a way that is computationally di�cult for an adversary to discern them.
This approach can increase the chance of detecting the malicious process in early
stages of the attacks regardless of the fact that the ransomware sample uses novel
strategies or customized/standard cryptosystems.
2.4 Financial Incentives
Since the ultimate goal of ransomware attacks is to get money from victims, the
payment method is an important aspect of the attacks. Cybercriminals continuously
strive to �nd more reliable charging methods by improving two important proper-
ties: (1) the di�culty of tracing the recipient of the payments, and (2) the ease of
exchanging payments into a preferred currency. Table 2.5 provides a breakdown of
the charging methods used by ransomware families over the past years. Our analysis
suggests that sending SMS to premium numbers is not necessarily used in old types
of ransomware attacks. For example, the charging method in Calelk is still based
on using premium numbers. The premium rate numbers were hard-coded in the ran-
somware sample or were downloaded from the C&C servers in each infection. This
class of ransomware attacks requires the least amount of technical background and
when propagated in a large scale the revenue could be signi�cant.
A large fraction of ransomware samples (88.22%) used prepaid online payment
systems such as Moneypak, Paysafecard, and Ukash cards, since they provide limited
possibilities to trace the money. These services are not tied to any banking authority
and the owner of the money is anonymous. The ransomware business model takes
advantage of these systems since there are no records of the vouchers to trace cyber-
criminals. In a typical scenario, once a ransomware criminal receives the vouchers,
44
Table 2.5: Summary of types of charges in 15 ransomware families.
Families Type of Charge
PremiumNumber
UntraceablePayments
OnlineShopping
BitcoinTransactions
Reveton ✓ ✓Cryptolocker ✓ ✓CryptoWall ✓ ✓Tobfy ✓Seftad ✓Winlock ✓ ✓Loktrom ✓Calelk ✓Urausy ✓ ✓Krotten ✓BlueScreen ✓kovter ✓ ✓Filecoder ✓GPcode ✓Weelsof ✓Number of Samples 132 (9.71%) 1,199 (88.22%) 14(1.03%) 43 (3.16%)Number of Variants 18 (18.2%) 77 (77.78%) 4 (4.05%) 6 (6.06%)
in order to monetize them, he can sell vouchers in underground voucher exchange
forums, ICQ, or other untraceable communication channels for a lower price than the
nominal value of the vouchers. We also found some unconventional methods used for
charging victims. We found two variants of Kevtor family that forced users to buy a
software package which unlocked the compromised computer. Figure 2-4 represents
the amount charged per family based on our data set. The amount of money required
by ransomware owners to unlock the computer changes based on variants and fami-
lies. For examples, 48.43% of samples among top six families demanded between 150
to 250 dollars.
2.4.1 Bitcoin as a Charging Method
Bitcoin provides some unique technical and privacy advantages for miscreants behind
ransomware attacks. Bitcoin transactions are cryptographically signed messages that
45
Reveton Cryptolocker Winlock Weelsof UraustTop Ransomware Families
0
100
200
300
400
500
600Am
ount
of R
anso
m re
ques
ted
per F
amily
Figure 2-4: The amount of ransom money among common ransomware families.
embody a fund transfer from one public key to another and only the corresponding
private key can be used to authorize the fund transfer. Furthermore, Bitcoin keys are
not explicitly tied to real users, although all transactions are public. Consequently,
ransomware owners can protect their anonymity and avoid revealing any information
that might be used for tracing them.
We performed an analysis of the use of Bitcoins in recent ransomware attacks
where victims had to buy Bitcoins in order to access their resources. We acquired the
Bitcoin addresses by searching the web as well as public forums [89] that conducted
discussions on Cryptolocker attacks. Victims typically participated in the discus-
sions by posting information about their infection and the Bitcoin addresses to which
46
they were required to send the ransom. We collected 1,872 Bitcoin addresses during
the experiments. We automatically queried the transactions from publicly accessible
Bitcoin block explorer websites [23] and parsed the results into a database.
1 5 10 15 20 25 30 35 40 45 50 55 60No of Bitcoins
0.0
0.2
0.4
0.6
0.8
1.0
No.o
f Bitc
oins
Per
Add
ress
(CDF
)
Figure 2-5: The number of Bitcoins per address.
The number of Bitcoins collected by cybercriminals during Cryptolocker attack
is previously reported [97]. Our main focus in this part is to provide insights into
how cybercriminals employed Bitcoin to collect the ransom fee based on the transac-
tions history. One of the questions we wanted to answer was whether it is possible to
detect illicitly-gained Bitcoins based on the transaction history of a Bitcoin address.
Our analysis suggests that identifying these Bitcoins is getting signi�cantly di�cult
since cybercriminals have started to use evasive approaches to protect their privacy
(e.g., multiple independent Bitcoin addresses, small Bitcoin amounts, short activity
period, small transaction records) after receiving large volumes of Bitcoins from vic-
tims. One reason to use multiple independent addresses with small Bitcoin amounts
47
could be that concealing the source of thousands of illicitly-obtained Bitcoins is a
critical task if cybercriminals want to transfer the Bitcoins via recognized exchanges
without being noticed. In fact, this is the main evolution in employing Bitcoin in
ransomware attacks to make the potential tracing procedures more di�cult in the
Bitcoin network.
2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62Number of Transactions
0.0
0.2
0.4
0.6
0.8
1.0
No.o
f Tra
nsac
tions
Per
Add
ress
(CDF
)
Figure 2-6: The total number of transactions per Bitcoin address.
Our analysis on Bitcoin transactions shows that 84.46% of Bitcoin addresses had
no more than six transactions. Furthermore, a signi�cant fraction of these Bitcoin
addresses (68.93%) were active for at most 10 days. These addresses were directly used
to receive Bitcoins from victims. Another type of addresses had more transactions and
were active for a longer period of time (e.g., more than 10 days). These addresses were
used to aggregate the collected ransom fees. Figure 2-5 shows the CDF of number
of Bitcoin per Bitcoin address. In 48.9% of Bitcoin addresses that we analyzed, a
Bitcoin address received at most two Bitcoins. These transactions have occurred in
48
early steps of the attacks when two Bitcoins were worth roughly 200 dollars equal to
the ransom fee required by cybercriminals to send the decryption key.
2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62Number of Transactions
0.0
0.2
0.4
0.6
0.8
1.0
No.o
f Tra
nsac
tions
Per
Add
ress
(CDF
)
Figure 2-7: The duration of activity for Bitcoin addresses.
As shown in Figure 2-6, approximately 72.9% of Bitcoin transactions belong to Bit-
coin addresses with two transactions. The incoming transaction was made by victims
to pay the ransom and, the outgoing transaction was performed by cybercriminals.
The collected Bitcoins were transferred through tens of temporary intermediate ac-
counts or split into many small amounts in order to be recombined in a new account
later to decrease possibilities of tracing the money.
49
0-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 >40Duration of Bitcoin Addresses Activity (days)
0
10
20
30
40
50
60
Perc
enta
ge o
f Bitc
oin
Addr
esse
s (%
)
Figure 2-8: The duration of activity for Bitcoin addresses.
As provided in Figure 2-8, our observation also suggests that Bitcoin addresses
that were used to collect Bitcoins from victims have a relatively short duration of
activity. This is due to the fact that the accumulated Bitcoins had to be transferred
to other accounts within a few hours or a few days probably to use mix services and
conceal the source of the money.
2.5 Related Work
Ransomware and Underground Economy Various security vendors have re-
ported the threat potential of ransomware attacks based on the number of infections
that they observed [11, 86, 101]. The use of cryptography to mount extortion based
attacks was �rst introduced in [110]. Employing Microsoft Cryptographic API (MS
50
CAPI) calls to design cryptovirus samples was presented by Young [111]. Young
demonstrated how to use MS CAPI to generate keys and encrypt the user's data.
The �rst step to analyze speci�c ransomware families was made by Gazet by
analyzing three primitive ransomware families [43]. He concluded that while these
early families were designed for massive propagation, they did not ful�ll the basic
requirements (e.g., su�ciently long encryption keys) for mass extortion.
The presence of scareware as rogue security software has been also studied over
the past few years. Stone-Gross et al. performed an analysis of underground economy
of fake antivirus software. They built an economic model that showed how cyber-
criminals performed refunds and chargebacks in order to conceal their criminal nature
for a longer period of time [99]. Cova et al. provided an analysis of fake antivirus
structure and measured the number of victims and the pro�ts gained based on the
web servers used by several fake antivirus groups [36].
Bitcoin Privacy Bitcoin has also recently received considerable interest regard-
ing the security and anonymity in security research. Meiklejohn et al. developed a
clustering heuristic that was used to cluster Bitcoin addresses belonging to a partic-
ular user [74]. They discussed the potential anonymity in the Bitcoin protocol and
the actual anonymity achieved by users. Reid et al. constructed two graphs based on
publicly available transaction history [42]. They used the properties of these graphs
to illustrate how information leakage can be used to de-anonymize the system's users.
Using this technique, they described the �ow of stolen money from MyBitcoin. Re-
cently, Ron et al. performed an analysis over the user graph and provided an in-depth
analysis of the largest transactions in Bitcoin history [93]. In another work, Möser
performed an analysis of the anonymity and transaction graph of three Bitcoin mix
51
services. He found that all the three Bitcoin mix services had a distinct transaction
graph pattern, but some of them were more successful than others [80]. In order to
characterize the popularity of illicit goods, Christin performed an analysis by extract-
ing data from Silk Road marketplace [30]. Although the work does not examine the
Bitcoin block chain, it provides an estimation of the market value of such transactions.
A closer and concurrent work to our interest was performed by Spagnuolo et
al. that parsed the blockchain and clustered the Bitcoin addresses that were likely to
belong to certain users or groups [97]. They labeled the users based on the information
that was scraped from openly available resources. They were able to label Bitcoin
addresses on real-world cases such as Silk Road and Cryptolocker ransomware. We
also used public repositories to extract Bitcoin addresses that belong to cybercriminals
behind ransomware attacks. However, unlike Stagnuolo et al. work [97], our goal is to
characterize the Bitcoin addresses used for malicious intents based on the transaction
history rather than de-anonymizing the Bitcoin users.
2.6 Conclusions
In this work, we performed a long-term analysis of ransomware families with a special
focus on their destructive functionality. The characterization of ransomware attacks
was based on 1,359 ransomware samples among 15 families that have emerged over
the last few years. Our results show that a signi�cant number of ransomware families
share very similar characteristics in the core part of the attacks, but still lack reliable
destructive functions to successfully target victims' �les.
We also describe how a malicious process interacts with the �le system when a
compromised computer is under a ransomware attack. We observed that suspicious
52
�le system activity of multiple types of destructive ransomware families can be re-
liably monitored. When looking at the execution traces of the malware programs,
we observed that the way malicious processes generate requests to access �le system
was signi�cantly di�erent from benign processes. We also observed that di�erent
classes of ransomware attacks with multiple levels of sophistication share very similar
characteristics from �le system perspective due to the nature of these attacks. Un-
like recent discussions in security community about ransomware attacks, our analysis
suggests that implementing practical defense mechanisms is still possible, if we e�ec-
tively monitor the �le system activity for example the changes in Master File Table
(MFT) or the types of I/O Request Packets (IRP) generated on behalf of processes
to access the �le system. We propose a general methodology that allow us to detect
a signi�cant number of ransomware attacks without making any assumptions on how
samples attack users' �les.
53
54
Chapter 3
A Dynamic Analysis Approach to
Detecting Ransomware
3.1 Introduction
Malware continues to remain one of the most important security threats on the In-
ternet today. Recently, a speci�c form of malware called ransomware has become
very popular with cybercriminals. Although the concept of ransomware is not new �
such attacks were registered as far back as the end of the 1980s � the recent success
of ransomware has resulted in an increasing number of new families in the last few
years [11, 52, 56, 101, 103]. For example, CryptoWall 3.0 made headlines around
the world as a highly pro�table ransomware family, causing an estimated $325M in
damages [102]. As another example, the Sony ransomware attack [64] received large
media attention, and the U.S. government even took the o�cial position that North
Korea was behind the attack.
Ransomware operates in many di�erent ways, from simply locking the desktop
55
of the infected computer to encrypting all of its �les. Compared to traditional mal-
ware, ransomware exhibits behavioral di�erences. For example, traditional malware
typically aims to achieve stealth so it can collect banking credentials or keystrokes
without raising suspicion. In contrast, ransomware behavior is in direct opposition
to stealth, since the entire point of the attack is to openly notify the user that she is
infected.
Today, an important enabler for behavior-based malware detection is dynamic
analysis. These systems execute a captured malware sample in a controlled envi-
ronment, and record its behavior (e.g., system calls, API calls, and network tra�c).
Unfortunately, malware detection systems that focus on stealthy malware behavior
(e.g., suspicious operating system functionality for keylogging) might fail to detect
ransomware because this class of malicious code engages in activity that appears sim-
ilar to benign applications that use encryption or compression. Furthermore, these
systems are currently not well-suited for detecting the speci�c behaviors that ran-
somware engages in, as evidenced by misclassi�cations of ransomware families by AV
scanners [27, 92].
In this chapter, we present a novel dynamic analysis system that is designed to an-
alyze and detect ransomware attacks and model their behaviors. In our approach, the
system automatically creates an arti�cial, realistic execution environment and mon-
itors how ransomware interacts with that environment. Closely monitoring process
interactions with the �lesystem allows the system to precisely characterize crypto-
graphic ransomware behavior. In parallel, the system tracks changes to the com-
puter's desktop that indicates ransomware-like behavior. The key insight is that in
order to be successful, ransomware will need to access and tamper with a victim's �les
56
or desktop. Our automated approach, called Unveil, allows the system to analyze
many malware samples at a large scale, and to reliably detect and �ag those that
exhibit ransomware-like behavior. In addition, the system is able to provide insights
into how the ransomware operates, and how to automatically di�erentiate between
di�erent classes of ransomware.
We implemented a prototype of Unveil in Windows on top of the popular open
source malware analysis framework Cuckoo Sandbox [37]. Our system is implemented
through custom Windows kernel drivers that provide monitoring capabilities for the
�lesystem. Furthermore, we added components that run outside the sandbox to
monitor the user interface of the target computer system.
We performed a long-term study analyzing 148,223 recent general malware samples
in the wild. Our large-scale experiments show that Unveil was able to correctly
detect 13,637 ransomware samples from multiple families in live, real-world data feeds
with no false positives. Our evaluation also suggests that current malware analysis
systems may not yet have accurate behavioral models to detect di�erent classes of
ransomware attacks. For example, the system was able to correctly detect 7,572
ransomware samples that were previously unknown and undetected by traditional
AVs, but belonged to modern �le locker ransomware families. Unveil was also able
to detect a new type of ransomware that had not previously been reported by any
security company. This ransomware also did not show any malicious activity in
a modern sandboxing technology provided by a well-known anti-malware company,
while showing heavy �le encryption activity when analyzed by Unveil.
The high detection rate of our approach suggests that Unveil can complement
current malware analysis systems to quickly identify new ransomware samples in the
57
wild. Unveil can be easily deployed on any malware analysis system by simply
attaching to the �lesystem driver in the analysis environment.
In summary, this work makes the following contributions:
� We present a novel technique to detect ransomware known as �le lockers that
targets �les stored on a victim's computer. Our technique is based on moni-
toring system-wide �lesystem accesses in combination with the deployment of
automatically-generated arti�cial user environments for triggering ransomware.
� We present a novel technique to detect ransomware known as screen lockers.
Such ransomware prevents access to the computer system itself. Our technique
is based on detecting locked desktops using dissimilarity scores of screenshots
taken from the analysis system's desktop before, during, and after executing
the malware sample.
� We performed a large-scale evaluation to show that our approach can e�ectively
detect ransomware. We automatically detected and veri�ed 13,637 ransomware
samples from a dataset of 148,223 recent general malware. In addition, we
found one previously unknown ransomware sample that does not belong to any
previously reported family. Our evaluation demonstrates that our technique
works well in practice (achieving a true positive [TP] rate 96.3% at zero false
positives [FPs]), and is useful in automatically identifying ransomware samples
submitted to analysis and detection systems.
The rest of the chapter is structured as follows. In Section 3.2, we brie�y present
background information and explain di�erent classes of ransomware attacks. In Sec-
tion 3.3, we describe the architecture of Unveil and explain our detection approaches
58
for multiple types of ransomware attacks. In Section 3.4, we provide more details
about our dynamic analysis environment. In Section 3.5, we present the evaluation
results. Limitations of the approach are discussed in Section 3.6, while Section 3.7
presents related work. Finally, Section 3.8 concludes the chapter.
3.2 Background
Ransomware, like other classes of malware, uses a number of strategies to evade
detection, propagate, and attack users. For example, it can perform multi-infection
or process injection, ex�ltrate the user's information to a third party, encrypt �les, and
establish secure communication with C&C servers. Our detection approach assumes
that ransomware samples can and will use all of the techniques that other malware
samples may use. In addition, our system assumes that successful ransomware attacks
perform one or more of the following activities.
Persistent desktop message. After successfully performing a ransomware infec-
tion, the malicious program typically displays a message to the victim. This �ransom
note� informs the users that their computer has been �locked� and provides instruc-
tions on how to make a ransom payment to restore access. This ransom message
can be generated in di�erent ways. A popular technique is to call dedicated API
functions (e.g., CreateDesktop()) to create a new desktop and make it the default
con�guration to lock the victim out of the compromised system. Malware writers
can also use HTML or create other forms of persistent windows to display this mes-
sage. Displaying a persistent desktop message is a classic action in many ransomware
attacks.
Indiscriminate encryption and deletion of the user's private �les. A crypto-
59
style ransomware attack lists the victim's �les and aggressively encrypts any private
�les it discovers. Access is restricted by withholding the decryption key. Encryption
keys can be generated locally by the malware on the victim's computer, or remotely
on C&C servers, and then delivered to the compromised computer. An attacker can
use customized destructive functions, or Windows API functions to delete the original
user's �les. The attacker can also overwrite �les with the encrypted version, or use
secure deletion via the Windows Secure Deletion API.
Selective encryption and deletion of the user's private �les based on certain
attributes (e.g., size, date accessed, extension). In order to avoid detection,
a signi�cant number of ransomware samples encrypt a user's private �les selectively.
In the simplest form, the ransomware sample can list the �les based on the access
date. In more sophisticated scenarios, the malware could also open an application
(e.g., word.exe) and list recently accessed �les. The sample can also inject malicious
code into any Windows application to obtain this type of information (e.g., directly
reading process memory).
In this work, we address all of these scenarios where an adversary has already
compromised a system, and is able to launch arbitrary ransomware-related operations
on the user's �les or desktop.
3.3 Unveil Design
In this section, we describe our techniques for detecting multiple classes of ransomware
attacks. We refer the reader to Section 3.4 for details on the implementation details
of the prototype.
60
3.3.1 Detecting File Lockers
We �rst describe why our system creates a unique, arti�cial user environment in
each malware run. We then present the design of the �lesystem activity monitor and
describe how Unveil uses the output of the �lesystem monitor to detect ransomware.
Generating Arti�cial User Environments
Protecting malware analysis environments against �ngerprinting techniques is non-
trivial in a real-world deployment. Sophisticated malware authors exploit static fea-
tures inside analysis systems (e.g., name of a computer) and launch reconnaissance-
based attacks [70] to �ngerprint both public and private malware analysis systems.
The static features of analysis environments can be viewed as the Achilles' heel
of malware analysis systems. One static feature that can have a signi�cant impact
on the e�ectiveness of the malware analysis systems is the user data that can be
e�ectively used to �ngerprint the analysis environment. That is, even on bare-metal
environments where classic tricks such as virtualization checks are not possible, an
unrealistic looking user environment can be a telltale sign that the code is running in
a malware analysis system.
Intuitively, a possible approach to address such reconnaissance attacks is to build
the user environment in such a way that the user data is valid, real, and non-
deterministic in each malware run. These automatically-generated user environments
serve as an �enticing target� to encourage ransomware to attack the user's data while
at the same time preventing the possibility of being recognized by adversaries.
In practice, generating a user environment is a non-trivial problem, especially
if this is to be done automatically. This is because the content generator should
61
not allow the malware author to �ngerprint the automatically-generated user content
located in the analysis environment, and also determine that it does not belong to a
real user. We elaborate on how we automatically generate an arti�cial � yet realistic
� user environment for ransomware in each malware run in Section 3.4.1.
Filesystem Activity Monitor
The �lesystem monitor in Unveil has direct access to data bu�ers involved in I/O
requests, giving the system full visibility into all �lesystem modi�cations. Each I/O
operation contains the process name, timestamp, operation type, �lesystem path
and the pointers to the data bu�ers with the corresponding entropy information in
read/write requests. The generation of I/O requests happens at the lowest possible
layer to the �lesystem. For example, there are multiple ways to read, write, or list �les
in user-/kernel-mode, but all of these functions are ultimately converted to a sequence
of I/O requests. Whenever a user thread invokes an I/O API, an I/O request is
generated and is passed to the �lesystem driver. Figure 4-1 shows a high-level design
of Unveil in the Windows environment.
Unveil's monitor sets callbacks on all I/O requests to the �lesystem generated
on behalf of any user-mode processes. We note that for Unveil operations, it is
desirable to only set one callback per I/O request for performance reasons, and that
this also maintains full visibility into I/O operations. In Unveil, user-mode process
interactions with the �lesystem are formalized as access patterns. We consider access
62
patterns in terms of I/O traces, where a trace T is a sequence of ti such that
ti = ⟨P, F,O,E⟩ ,
P is the set of user-mode processes,
F is the set of available �les,
O is the set of I/O operations, and
E is the entropy of read or write data bu�ers.
For all of the �le locker ransomware samples that we studied, we empirically ob-
served that these samples issue I/O traces that exhibit distinctive, repetitive patterns.
This is due to the fact that these samples each use a single, speci�c strategy to deny
access to the user's �les. This attack strategy is accurately re�ected in the form of
I/O access patterns that are repeated for each �le when performing the attack. Con-
sequently, these I/O access patterns can be extracted as a distinctive I/O �ngerprint
for a particular family. We note that our approach mainly considers write or delete
requests. We elaborate on extracting I/O access patterns per �le in Section 3.3.1.
I/O Data Bu�er Entropy. For every read and write request to a �le captured
in an I/O trace, Unveil computes the entropy of the corresponding data bu�er.
Comparing the entropy of read and write requests to and from the same �le o�set
serves as an excellent indicator of crypto-ransomware behavior. This is due to the
common strategy to read in the original �le data, encrypt it, and overwrite the original
data with the encrypted version. The system uses Shannon entropy [69] for this
computation. In particular, assuming a uniform random distribution of bytes in a
63
Calculate Entropy
Identify Process
I/O Type
I/O Scheduler
FileSystem Driver
Physical Device
I/O Requests
I/O MonitorEXIT
file’s data Buffer
UNVEIL
User ModeKernel Mode
I/O MonitorENTER Record I/O
Request
Identify File OP
. . .
Process 1 Process 2 Process 3 Process N
read write delete write
I/O Access Monitor
Figure 3-1: Overview of the design of I/O access monitor in Unveil.
data block d, we have
H (d) = −n∑
i=1
log2 n
n.
Constructing Access Patterns. For each execution, after Unveil generates I/O
access traces for the sample, it sorts the I/O access requests based on �le names and
request timestamps. This allows the system to extract the I/O access sequence for
each �le in a given run, and check which processes accessed each �le. The key idea is
that after sorting the I/O access requests per �le, repetition can be observed in the
64
way I/O requests are generated on behalf of the malicious process.
The particular detection criterion used by the system to detect ransomware sam-
ples is to identify write and delete operations in I/O sequences in each malware run.
In a successful ransomware attack, the malicious process typically aims to encrypt,
overwrite, or delete user �les at some point during the attack. In Unveil, these I/O
request patterns raise an alarm, and are detected as suspicious �lesystem activity. We
studied di�erent �le locker ransomware samples across di�erent ransomware families.
Our analysis shows that although these attacks can be very di�erent in their attack
strategies (e.g., evasion techniques, key generation, key management, connecting to
C&C servers), they can be categorized into three main classes of attacks based on
their access requests.
overwrite
Open
Write
Close
read File x
Read
File x
Open
Read
Close
File x.locked
Open
Write
Close
encrypt delete File x
Open
Delete
Close
read File x
Open
Read
Close
File x.locked
Open
Write
Close
encrypt overwrite File x
Open
Read
Close
Write
(2)(1) (3)
Figure 3-2: Di�erent attack strategies among ransomware families with respect to I/Oaccess patterns.
Figure 3-2 shows the high-level access patterns for multiple ransomware families
we studied during our experiments. For example, the access pattern shown to the
left is indicative of Cryptolocker variants that have varying key lengths and desktop
locking techniques. However, its access pattern remains constant with respect to
family variants. We observed the same I/O activity for samples in the CryptoWall
family as well. While these families are identi�ed as two di�erent ransomware families,
65
since they use the same encryption functions to encrypt �les (i.e., the Microsoft
CryptoAPI), they have similar I/O patterns when they attack user �les.
As another example, in FileCoder family, the ransomware �rst creates a new �le,
reads data from a victim's �le, generates an encrypted version of the original data,
writes the encrypted data bu�er to the newly generated �le, and simply unlinks
the original user's �le (See Figure 3-2.2). In this class of �le locker ransomware, the
malware does not wipe the original �le's data from the disk. For attack approaches like
this, victims have a high chance of recovering their data without paying the ransom. In
the third approach (Figure 3-2.3), however, the ransomware creates a new encrypted
�le based on the original �le's data and then securely deletes the original �le's data
using either standard Windows APIs or custom overwriting implementations (e.g.,
such as CrypVault family).
3.3.2 Detecting Screen Lockers
The second core component of Unveil is aimed at detecting screen locker ran-
somware. The key insight behind this component is that the attacker must display a
ransom note to the victim in order to receive a payment. In most cases, the message
is prominently displayed, covering a signi�cant part, or all, of the display. As this ran-
som note is a virtual invariant of ransomware attacks, Unveil aims to automatically
detect the display of such notes.
The approach adopted by Unveil to detect screen locking ransomware is to mon-
itor the desktop of the victim machine, and to attempt to detect the display of a
ransom note. Similar to Grier et al. [45], we take automatic screenshots of the anal-
ysis desktop before and after the sample is executed. The screenshots are captured
66
from outside of the dynamic analysis environment to prevent potential tampering by
the malware. This series of screenshots is analyzed and compared using image analy-
sis methods to determine if a large part of the screen has suddenly changed between
captures. However, smaller changes in the image such as the location of the mouse
pointer, current date and time, new desktop icons, windows, and visual changes in
the task bar should be rejected as inconsequential.
In Unveil, we measure the structural similarity (SSIM) [106] of two screenshots
� before and after sample execution � by comparing local patterns of pixel intensities
in terms of both luminance and contrast as well as the structure of the two images.
Extracting structural information is based on the observation that pixels have strong
inter-dependencies � especially when they are spatially close. These dependencies
carry information about the structure of the objects in the image. After a success-
ful ransomware attack, the display of the ransom note often results in automatically
identi�able changes in the structural information of the screenshot (e.g., a large rect-
angular object covers a large part of the desktop). Therefore, the similarity of the
pre- and post-attack images decreases signi�cantly, and can be used as an indication
of ransomware.
In order to avoid false positives, Unveil only takes screenshots resulting from
persistent changes (i.e., changes that cannot be easily dismissed through user interac-
tion). The system �rst removes such transient changes (e.g., by automatically closing
open windows) before taking screenshots of the desktop. Using this preprocessing
step, ransomware-like applications that are developed for other purposes such as fake
AV are safely categorized as non-ransomware samples.
Unveil also extracts the text within the area where changes in the structure
67
of the image has occurred. The system extracts the text inside the selected area
and searches for speci�c keywords that are highly correlated with ransom notes
(e.g.,<lock, encrypt, desktop, decryption, key>).
Given two screenshots X and Y , we de�ne the structural similarity index of the
image contents of local windows xj and yj as
LocalSim (xj, yj) =(2µxµy + c1) (2σxy + c2)(
µ2x + µ2
y + c1) (
σ2x + σ2
y + c2)
where µx and µy are the mean intensity of xj and yj, and σx and σy are the standard
deviation as an estimate of xj and yj contrast and σxy is the covariance of xj and yj.
The local window size to compare the content of two images was set 8× 8. c1 and c2
are division stabilizer in the SSIM index formula [106]. We de�ne the overall similarity
between the two screenshots X and Y as the arithmetic mean of the similarity of the
image contents xj and yj at the jth local window where M is the number of local
windows of X and Y :
ImgSim (X, Y ) =1
M
M∑j=1
LocalSim (xj, yj) .
Since the overall similarity is always on [0, 1], the distance between X and Y is simply
de�ned as
Dist (X,Y ) = 1− ImgSim (X,Y ) .
Finally, we de�ne a similarity threshold τsim such that Unveil considers the sample
68
a potential screen locking ransomware if
Dist (X, Y ) > τsim.
Unveil then extracts the text within the image and searches for ransomware-related
words within the modi�ed area. Applying the image similarity test with the best
similarity threshold (see Section 3.5.2) gives us the highest recall with 100% precision
for the entire dataset.
3.4 Unveil Implementation
In this section, we describe the implementation details of a prototype of Unveil for
the Windows platform. We chose Windows for a proof-of-concept implementation
because it is currently the main target of ransomware attacks. We elaborate on
how Unveil automatically generates arti�cial, but realistic user environments for
each analysis run, how the system-wide monitoring was implemented, and how we
deployed the prototype of our system.
3.4.1 Generating User Environments
In each run, the user environment is made up of several forms of content such as
digital images, videos, audio �les, and documents that can be accessed during a
user Windows Session. The user content is automatically-generated according to the
following process:
For each �le extension from a space of possible extensions, a set of �les are gener-
ated where the number of �les for each extension is sampled from a uniform random
69
distribution for each sample execution. Each set of �les collectively forms a document
space for the sample execution environment. From a statistical perspective, docu-
ment spaces generated for each sample execution should be indistinguishable from
real user data. As an approximation to this ideal, randomly-selected numbers of �les
are generated per extension for each run according to the process described above.
In the following, we describe the additional properties that a document space
should have in order to complicate programmatic approaches that ransomware sam-
ples can potentially use to identify the automatically-generated user environment.
Valid Content. The user content generator creates real �les with valid headers and
content using standard libraries (e.g., python-docx, python-pptx, OpenSSL). Based
on empirical observation, we created four �le categories that a typical ransomware
sample tries to �nd and encrypt: documents, keys and licenses, �le archives, and
media. Document extensions include txt, doc(x), ppt(x), tex, xls(x), c, pdf and
py. Keys and license extensions include key, pem, crt, and cer. Archive extensions
include zip and rar �les. Finally, media extensions include jp(e)g, mp3, and avi.
For each sample execution, a subset of extensions are randomly selected and are used
to generate user content across the system.
In order to generate content that appears meaningful, we collected approximately
100,000 sentences by querying 500 English words in Google. For each query, we
collected the text from the �rst 30 search results to create a sentence list. We use
the collected sentences to generate the content for the user �les. We used the same
technique to create a word list to give a name to the user �les. The word list allows
us to create �les with variable name lengths that do not appear random. Clearly,
the problem with random content and name generation (e.g., xteyshtfqb.docx) is
70
that the attacker could programmatically calculate the entropy of the �le names
and contents to detect content that has been generated automatically. Hence, by
generating content that appears meaningful, we make it di�cult for the attacker to
�ngerprint the system and detect our generated �les.
File Paths. Note that the system is also careful to randomly generate the sup-
posed victim's directory structure. For example, directory names are also generated
based on meaningful words. Furthermore, the system also associates �les of certain
types with standard locations in the Windows directory structure for those �le types
(e.g., the system does not create document �les in a directory with image �les, but
rather under My Documents). The path length of user �les is also non-deterministic
and is generated randomly. In addition, each folder may have a set of sub-folders.
Consequently, the generated paths to user �les have variable depths relative to the
root folder.
Time Attributes. Another non-determinism strategy used by our approach is
to generate �les with di�erent creation, modi�cation, and access times. The �le time
attributes are sampled from a distribution of likely timestamps when creating the
�le. When the system creates �les with di�erent time attributes, the time attributes
of the containing folders are also updated automatically. In this case, the creation
time of the folder is the minimum of all creation times of �les and folders inside the
folder, while the modi�cation and access times are the maximum of the corresponding
timestamps.
While we have not observed ransomware samples that have attempted to use
�ngerprinting heuristics of the content of the analysis environment, the nondetermin-
ism strategies used by Unveil serve as a basis for making the analysis resilient to
71
�ngerprinting by design.
3.4.2 Filesystem Activity Monitor
Several techniques have been used to monitor sample �lesystem activity in malware
analysis environments. For example, �lesystem activity can be monitored by hooking
a list of relevant �lesystem API functions or relevant system calls using the System
Service Descriptor Table (SSDT). Unfortunately, these approaches are not suitable for
Unveil's detection approach for several reasons. First, API hooking can be bypassed
by simply copying a DLL containing the desired code and dynamically loading it into
the process' address space under a di�erent name. Stolen code [48, 51] and sliding
calls [51] are other examples of API hooking evasion that are common in the wild.
Furthermore, ransomware can use customized cryptosystems instead of the standard
APIs to bypass API hooking while encrypting user �les. Hooking system calls via
the SSDT also has other technical limitations. For example, it is prevented on 64-bit
systems due to Kernel Patch Protection (KPP). Furthermore, most SSDT functions
are undocumented and subject to change across di�erent versions of Windows.
Therefore, instead of API or system call hooking, Unveil monitors �lesystem
I/O activity using the Windows Filesystem Mini�lter Driver framework [78], which is
a standard kernel-based approach to achieving system-wide �lesystem monitoring in
multiple versions of Windows. The prototype consists of two main components for I/O
monitoring and retrieving logs of the entire system with approximately 2,800 SLOC
in C++. In Windows, I/O requests are represented by I/O Request Packets (IRPs).
Unveil's monitor sets callbacks on all I/O requests to the �lesystem generated on
behalf of user-mode processes. Basing Unveil's �lesystem monitor on a mini�lter
72
driver allows it to be located at the closest possible layer to the �lesystem with access
to nearly all objects of the operating system.
3.4.3 Desktop Lock Monitor
To identify desktop locking ransomware, screenshots are captured from outside of
the dynamic analysis environment to prevent potential tampering by the malware.
For dissimilarity testing, a python script implements the Structural Similarity Image
Metric (SSIM) as described in Section 3.3.2. Unveil �rst converts the images to
�oating point data, and then calculates parameters such as mean intensity µ using
Gaussian �ltering of the images' contents. We also used default values (k1 = 0.01
and k2 = 0.03) to obtain the values of c1 and c2 to calculate the structural similarity
score in local windows presented in Section 3.3.2.
The system also employs Tesseract-OCR [91], an open source OCR engine, to
extract text from the selected areas of the screenshots. To perform the analysis on
the extracted text within images, we collected more than 10,000 unique ransom notes
from di�erent ransomware families. We �rst clustered ransom notes based on the
family type and the visual appearance of the ransom notes. For each cluster, we
then extracted the ransom texts in the corresponding ransom notes and performed
pre-�ltering to remove unnecessary words within the text (e.g., articles, pronouns) to
avoid obvious false positive cases. The result is a word list for each family cluster
that can be used to identify ransom notes and furthermore label notes belonging to
a known ransomware family.
73
3.5 Evaluation
We evaluated Unveil with two experiments. The goal of the �rst experiment is to
demonstrate that the system can detect known ransomware samples, while the goal of
the second experiment is to demonstrate that Unveil can detect previously unknown
ransomware samples.
3.5.1 Experimental Setup
The Unveil prototype is built on top of Cuckoo Sandbox [37]. Cuckoo provides basic
services such as sample submission, managing multiple VMs, and performing simple
human interaction tasks such as simulating user input during an analysis. However,
in principle, Unveil could be implemented using any dynamic analysis system (e.g.,
BitBlaze [7], VxStream Sandbox [87]).
We evaluated Unveil using 56 VMs running Windows XP SP3 on a Ganeti clus-
ter based on Ubuntu 14.04 LTS. While Windows XP is not required by Unveil, it
was chosen because it is well-supported by Cuckoo sandbox. Each VM had multiple
NTFS drives. We took anti-evasion measures against popular tricks such as chang-
ing the IP address range and the MAC addresses of the VMs to prevent the VMs
from being �ngerprinted by malware authors. Furthermore, we permitted controlled
access to the Internet via a �ltered host-only adapter. In particular, the �ltering
allowed limited IRC, DNS, and HTTP tra�c so samples could communicate with
C&C servers. SMTP tra�c was redirected to a local honeypot to prevent spam, and
network bandwidth was limited to mitigate potential DoS attacks.
The operating system image inside the malware analysis system included typical
user data such as saved social networking credentials and a valid browsing history.
74
For each operating system image, multiple users were de�ned to run the experiments.
We also ran a script that emulated basic user activity while the malware sample
was running on the system, such as launching a browser and navigating to multiple
websites, or clicking on the desktop. This interaction was randomly-generated, but
was constant across runs. Each sample was executed in the analysis environment for
20 minutes. As described in Sections 3.3.1 and 3.3.2, user environments were gener-
ated for each run, �lesystem I/O traces were recorded, and pre- and post-execution
screenshots were captured. After each execution, the VM was rolled back to a clean
state to prevent any interference across executions. All experiments were performed
according to well-established experimental guidelines [94] for malware experiments.
3.5.2 Ground Truth (Labeled) Dataset
In this experiment, we evaluated the e�ectiveness of Unveil on a labeled dataset,
and ran di�erent screen locker samples to determine the best threshold value τsim for
the large-scale experiment.
We collected ransomware samples from public repositories [1, 4] and online forums
that share malware samples [3, 71]. We also received labeled ransomware samples from
two well-known anti-malware companies. In total, we collected 3,156 recent samples.
In order to make sure that those samples were indeed active ransomware, we ran
them in our test environment. We con�rmed 2,121 samples to be active ransomware
instances. After each run, we checked the �lesystem activity of each sample for any
signs of attacks on user data. If we did not see any malicious �lesystem activity, we
checked whether running the sample displayed a ransom note.
Table 4.3 describes the ransomware families we used in this experiment. We note
75
Family Type Samples
Cryptolocker crypto 33 (1.5%)CryptoWall crypto 42 (2.0%)CTB-Locker crypto 77 (3.6%)CrypVault crypto 21 (1.0%)CoinVault crypto 17 (0.8%)Filecoder crypto 19 (0.9%)TeslaCrypt crypto 39 (1.8%)Tox crypto 71 (3.3%)VirLock locker 67 (3.2%)Reveton locker 501 (23.6%)Tobfy locker 357 (16.8%)Urausy locker 877 (41.3%)
Total Samples - 2,121
Table 3.1: The list of ransomware families used in the �rst experiment.
that the dataset covers the majority of the current ransomware families in the wild. In
addition to the labeled ransomware dataset, we also created a dataset that consisted
of non-ransomware samples. These samples were submitted to the Anubis analysis
platform [47], and consisted of a collection of benign as well as malicious samples.
We selected 149 benign executables including applications that have ransomware-like
behavior such as secure deletion, encryption, and compression. A short list of these
applications are provided in Table 3.2. We also tested 384 non-ransomware malware
samples from 36 malware families to evaluate the false positive rate of Unveil.
Table 3.3 shows an example of I/O traces for CryptoWall 3.0 and CryptoWall 4.0
where the victim's �le is �rst read and then overwritten with an encrypted version.
The I/O access patterns of CryptoWall 4.0 samples to overwrite the content of the �les
are identical since they use the same cryptosystem. The main di�erence is that the
�lenames and extensions are modi�ed with random characters, probably to minimize
the chance of recovering the �les based on their names in the Master File Table (MFT)
76
Application Main Capability Version
7-zip Compression 15.06Winzip Compression 19.5WinRAR Compression 5.21DiskCryptor Encryption 1.1.846.118AESCrypt Encryption �Eraser Shredder 6.2.0.2969SDelete Shredder 1.61
Table 3.2: The list of benign applications that generate similar I/O access patternsto ransomware.
Run OP Proc FName O�set Entropy
CryptoWall 3 read explorer.exe document.cad [0, 4096) 5.21write explorer.exe document.cad [0, 4096) 7.04· · ·
CryptoWall 4 read explorer.exe project.cad [0, 4096) 5.21write explorer.exe project.cad [0, 4096) 7.11· · ·rename explorer.exe t67djkje.elkd8
Table 3.3: An example of I/O access in Unveil for CryptoWall 3.0 and CryptoWall 4.0.
in the NTFS �lesystem.
Filesystem Activity of Benign Applications with Potential Ransomware-
like Behavior
One question that arises is whether benign applications such as encryption or com-
pression programs might generate similar I/O request sequences, resulting in false
positives. Note that with benign applications, the original �le content is treated care-
fully since the ultimate goal is to generate an encrypted version of the original �le,
and not to restrict access to the �le. Therefore, the default mechanism in these appli-
cations is that the original �les remain intact even after encryption or compression.
If automatic deletion is deliberately activated by the user after the encryption, it can
77
Application OP Description
CrypVault read read low entropy bu�er from original �lewrite write high entropy bu�er to a new �le· · ·write overwrite the bu�er of the original �ledelete read attributes, delete the original �le
CryptoWall4 read read low entropy bu�erwrite overwrite with high entropy bu�er· · ·rename read attributes, rename the �les
SDelete write overwrite data bu�er· · ·delete read attributes, delete the �le
7-zip read read data bu�er from original �lewrite write data bu�er to a new �le· · ·
Table 3.4: I/O accesses for deletion and compression mechanisms in benign/maliciousapplications.
potentially result in a false positive (see Figure 3-2.2). However, in our approach, we
assume that the usual default behavior is exhibited and the original data is preserved.
We believe that this is a reasonable assumption, considering that we are building an
analysis system that will mainly analyze potentially suspicious samples captured and
submitted for analysis. Nevertheless, we investigated the I/O access patterns of be-
nign programs, shown in Table 3.4. The I/O traces indicate that these programs
exhibit distinguishable I/O access patterns as a result of their default behavior.
Benign applications might not necessarily perform encryption or deletion on user
�les, but can change the content of the �les. For example, updating the content
of a Microsoft PowerPoint �le (e.g., embedding images and media) generates I/O
requests similar to ransomware (see Figure 3-2.1). However, the key di�erence here
is that such applications usually generate I/O requests for a single �le at a time
78
t = 0.32
Figure 3-3: Precision-recall analysis of the tool.
and repetition of I/O requests does not occur over multiple user �les. Also, note that
benign applications typically do not arbitrarily encrypt, compress or modify user �les,
but rather need sophisticated input from users (e.g., �le names, keys, options, etc.).
Hence, most applications would simply exit, or expect some input when executed in
Unveil.
Similarity Threshold
We performed a precision-recall analysis to �nd the best similarity threshold τsim for
desktop locking detection. The best threshold value to discriminate between similar
and dissimilar screenshots should be de�ned in such a way that Unveil is be able to
detect screen locker ransomware while maintaining an optimal precision-recall rate.
79
Evaluation Results
Total Samples 148,223Detected Ransomware 13,637 (9.2%)Detection Rate 96.3%False Positives 0.0%New Detection 9,872 (72.2%)
Table 3.5: Unveil detection results.
Figure 3-3 shows empirical precision-recall results when varying τsim. As the �gure
shows, with τsim = 0.32, more than 97% of the ransomware samples across both screen
and �le locker samples are detected with 100% precision. In the second experiment,
we used this similarity threshold to detect screen locker ransomware in a malware
feed unknown to Unveil.
3.5.3 Detecting Zero-Day Ransomware
The main goal of the second experiment is to evaluate the accuracy of Unveil when
applied to a large dataset of recent real-world malware samples. We then compared
our detection results with those reported by AV scanners in VirusTotal.
This dataset was acquired from the daily malware feed provided by Anubis [47] to
security researchers. The samples were collected from May 18th 2015 until February
12th 2016. The feed is generated from the Anubis submission queue, which is fed
in turn by Internet users and security companies. Hence, before performing the
experiment, we �ltered the incoming Anubis samples by removing those that were
not obviously executable (e.g., PDFs, images). After this �ltering step, the dataset
contained 148,223 distinct samples. Each sample was then submitted to Unveil to
obtain I/O access traces and pre-/post-execution desktop image dissimilarity scores.
80
Detection Results
Table 3.5 shows the evaluation results of the second experiment. With the similarity
threshold τsim = 0.32, Unveil labeled 13,637 (9.2% of the dataset) samples in the
Anubis malware feed as being ransomware; these included both �le locker and desktop
locker samples.
Evaluation of False Positives. As we did not have a labeled ground truth in
the second experiment, we cannot provide an accurate precision-recall analysis, and
verifying the detection results is clearly challenging. For example, re-running samples
while checking for false positives is not feasible in all cases since samples may have
become inactive at the time of re-analysis (e.g., the C&C server might have been
taken down).
Hence, we used manual veri�cation of the detection results. That is, for the
samples that were detected as screen locker ransomware, we manually checked the
post-attack screenshots that were reported taken by Unveil. The combination of
structural similarity test and the OCR technique to extract the text provides a reliable
automatic detection for this class of ransomware. We con�rmed thatUnveil correctly
reported 4,936 samples that delivered a ransom note during the analysis.
Recall that Unveil reports a sample as a �le locker ransomware if the I/O access
pattern follows one of the three classes of ransomware attacks described in Figure 3-
2. For �le locker ransomware samples, we used the I/O reports for each sample. We
listed all the I/O activities on the �rst �ve user �les during that run and looked for
suspicious I/O activity such as requesting write and/or delete operations. Note that
the detection approach used in Unveil is only based on the I/O access pattern. We
81
do not check for changes in entropy in the detection phase and it is only used for our
evaluation.
If we �nd multiple write or delete I/O requests to the �rst �ve generated user �les
and also a signi�cant increase in the entropy between read and write data bu�ers at a
given �le o�set, or the creation of new high entropy �les, we con�rmed the detection
as a true positive. The creation of multiple new high entropy �les based on user �les
is a reliable sign of ransomware in our tests. For example, the malware sample that
uses secure deletion techniques may overwrite �les with low entropy data. However,
the malicious program �rst needs to generate an encrypted version of the original
�les. In any case, generating high entropy data raises an alarm in our evaluation.
By employing these two approaches and analyzing the results, we did not �nd any
false positives. There were a few cases that had signi�cant change in the structure of
the images. Our closer investigation revealed that the installed program generated a
large installation page, showed some unreadable characters in the window, and did not
close even if the button was clicked (i.e., non-functional buttons). In another case, the
program generated a large setup window, but it did not proceed due to a crash. These
cases produce a higher dissimilarity score than the threshold value. However, since
the extracted text within those particular windows did not contain any ransomware-
related contents, Unveil safely categorized them as being non-ransomware samples.
Evaluation of False Negatives. Determining false negative rates is a challenge
since manually checking 148,223 samples is not feasible. In the following, we provide
an approximation of false negatives for Unveil.
In our tests on the labeled dataset, false negatives mainly occurred in samples
that make persistent changes on the desktop, but since the dissimilarity score of pre-
82
/post-attack is less that τsim = 0.32, it is not detected as ransomware by Unveil.
Our analysis of labeled samples from multiple ransomware families (see Section3.5.2)
shows that these cases were mainly observed in samples with a similarity score be-
tween the interval [0.18, 0.32). This is because for lower similarity scores, changes
in the screenshots are negligible or small (e.g., Windows warning/error messages).
Consequently, in order to increase the chance of catching false negative cases, we
selected all the samples where their dissimilarity score was between [0.18, 0.32). This
decreases the size of potential desktop locker ransomware that were not detected by
Unveil to 4,642 samples. We manually checked the post-attack screenshots of these
samples, and found 377 desktop locker ransomware that Unveil was not able to
detect. Our analysis shows that the false negatives in desktop locker ransomware
resulted from samples in one ransomware family that generated a very transparent
ransom note with a dissimilarity score between [0.27, 0.31] that was di�cult to read.
For �le locker ransomware, we �rst removed the samples that were not detected
as malware by any of the AV scanners in VirusTotal after multiple resubmissions
in consecutive days (see Section 3.5.3). By applying this approach, we were able to
reduce the number of samples to check by 47%. Then, we applied a similar approach
we used as described above. We listed the �rst �ve user �les generated for that sample
run and checked whether any process requested write access to those �les. We also
checked the entropy of multiple data bu�ers. If we identi�ed write access with a
signi�cant increase in the entropy of data bu�ers compared to the entropy of data
bu�er in the read access for those �les, we report it as a false negative.
Our test shows that Unveil does not have any false negatives in �le locker ran-
somware samples. Consequently, we conclude that Unveil is able to detect multiple
83
classes of ransomware attacks with a low false positive rate (FPs = 0.0% at a TP =
96.3%).
Early Warning
One of the design goals of Unveil is to be able to automatically detect previously
unknown (i.e., zero-day) ransomware. In order to run this experiment, we did the
following. Once per day over the course of the experiment, we built a malware dataset
that was concurrently submitted to Unveil and VirusTotal. If a sample was detected
as ransomware by Unveil, we checked the VirusTotal (VT) detection results. In cases
where a ransomware sample was not detected by any VT scanner, we reported it as
a new detection.
In addition, we also measured the lag between a new detection by Unveil and
a VT detection. To that end, we created a dataset from the newly detected samples
submitted on days {1, 2, . . . , n− 1, n} and re-submitted these samples to see whether
the detection results changed. We considered the result of all 55 VT scanners in
this experiment. Since the number of scanners is relatively high, we de�ned a VT
detection ratio ρ as the ratio of the total number of scanners that identi�ed the
sample as ransomware or malware to the total number of scanners checked by VT. ρ
is therefore a value on the interval [0,1] where zero means that the sample was not
detected by any of the 55 VT scanners, and 1 means that all scanners reported the
sample as malware or ransomware. Since there is no standard labeling scheme for
malware in the AV industry, a scanner can label a sample using a completely di�erent
name from another scanner. Consequently, to avoid biased results, we consider the
labeling of a sample using any name as a successful detection.
84
0.0 0.2 0.4 0.6 0.8 1.00.00.10.20.30.40.50.60.7
Submission #1
0.0 0.2 0.4 0.6 0.8 1.00.000.050.100.150.200.25
Submission #2
0.0 0.2 0.4 0.6 0.8 1.00.000.050.100.150.20 Submission #3
0.0 0.2 0.4 0.6 0.8 1.00.000.050.100.150.20 Submission #4
0.0 0.2 0.4 0.6 0.8 1.00.000.050.100.150.20 Submission #5
0.0 0.2 0.4 0.6 0.8 1.00.000.050.100.150.200.25
Submission #6
Pollution Ratio
Dens
ity D
istr
ibut
ion
Figure 3-4: Evolution of VT scanner reports after six submissions.
In our experiment, we submitted the detected samples every day to see how the
VT detection ratio ρ changes over time. The distribution of ρ for each submission is
shown in Figure 3-4. Our analysis shows that ρ does not signi�cantly change after
a small number of subsequent submissions. For the �rst submission, 72.2% of the
ransomware samples detected by Unveil were not detected by any of the 55 VT
scanners. After a few submissions, ρ does not change signi�cantly, but generally was
concentrated either towards small or very large ratios. This means that after a few re-
submissions, either only a few scanners detected a sample, or almost all the scanners
detected the sample.
85
0 100 200 300 400 500 6000.000.050.100.150.200.250.30
QUERY OP
0 100 200 300 400 500 6000.000.050.100.150.200.25
READ OP
0 100 200 300 400 500 6000.000.020.040.060.080.100.120.140.16
WRITE OP
Distr
ibutio
n(%)
Analysis Time (Sec)
Userspace file fingerprinting
Creating a list of files
Periodic file encryption
Sleep Time
Sleep Time
Sleep Time
Figure 3-5: I/O activities of a previously unknown ransomware family detected by Unveil.
3.5.4 Case Study: Automated Detection of a New Ransomware
Family
In this section, we describe a new ransomware family, called SilentCrypt, that was
detected by Unveil during the experiments. After our system detected these samples
and submitted them to VirusTotal, several AV vendors picked up on them and also
started detecting them a couple of days later, con�rming the malice of the sample
that we automatically detected.
This family uses a unique and e�ective method to �ngerprint the runtime envi-
ronment of the analysis system. Unlike other malware samples that check for speci�c
artifacts such as registry keys, background processes, or platform-speci�c character-
istics, this family checks the private �les of a user to determine if the code is running
86
in an analysis environment. When the sample is executed, it �rst checks the number
of �les in the user's directories, and sends this list to the C&C server before starting
the attack.
Multiple online malware analysis systems such as malwr.com, Anubis, and a mod-
ern sandboxing technology provided by a well-known, anti-malware company did not
register any malicious activity for this sample. However, the sample showed heavy
encryption activity when analyzed by Unveil.
An analysis of the I/O activity of this sample revealed that this family �rst waited
for several minutes before attacking the victim's �les. Figure 3-5 shows the three main
I/O activities of one of the samples in this family. The sample traverses the current
user's main directories, and creates a list of �les and folders. If the sample receives
permission to attack from the C&C server, it begins encrypting the targeted �les.
To con�rm Unveil's alerts, we conducted a manual investigation over several days.
Our analysis concluded that the malicious activity is started only if user activity is
detected. Unlike other ransomware samples that immediately attack a victim's �les
when they are executed, this family only encrypt �les that have recently been opened
by the user while the malicious process is monitoring the environment. That is, the
malicious process reads the �le's data and overwrites it with encrypted data if the �le
is used. The �le name is then updated to "filename.extension.locked_forever"
after it has been encrypted.
Unveil was able to detect this family of ransomware automatically because it
was triggered after the system accessed some of the generated user �les as a part of
the user activity emulation scripts. Once we submitted the sample to VirusTotal, the
sample was picked up by other AV vendors (5/55) after �ve days with di�erent labels.
87
A well-known, sandboxing-based security company con�rmed our �ndings that the
malware sample was a new threat that they had not detected before. We provide an
anonymous video of a sample from this ransomware family in [10].
3.6 Discussion and Limitations
The evaluation in Section 3.5 demonstrates that Unveil achieves good, practical,
and useful detection results on a large, real-world dataset. Unfortunately, malware
authors continuously observe defensive advances and adapt their attacks accordingly.
In the following, we discuss limitations of Unveil and potential evasion strategies.
There is always the possibility that attackers will �nd ways to �ngerprint the au-
tomatically generated user environment and avoid it. However, this comes at a high
cost, and increases the di�culty bar for the attacker. For example, in desktop-locking
ransomware, malware can use heuristics to look for speci�c user interaction before
locking the desktop (e.g., waiting for multiple login events or counting the number of
user clicks). However, implementing these approaches can potentially make detection
easier since these approaches require hooking speci�c functions in the operating sys-
tem. The presence of these hooking behaviors are themselves suspicious and are used
by current malware analysis systems to detect di�erent classes of malware. Further-
more, these approaches delay launching the attack which increases the risk of being
detected by AV scanners on clients before a successful attack occurs.
Another possibility is that a malware might only encrypt a speci�c part of a �le
instead of aggressively encrypting the entire �le, or simply shu�e the �le content
using a speci�c pattern that makes the �les unreadable. Although we have not seen
any sample with these behaviors, developing such ransomware is quite possible. The
88
key idea is that in order to perform such activities, the malicious program should
open the �le with write permission and manipulate at least some data bu�ers of the
�le content. In any case, if the malicious program accesses the �les, Unveil will still
see this activity. There is no real reason for benign software to touch automatically
generated �les with write permission and modify the content. Consequently, such
activities will still be logged. Malware authors might use other techniques to notify
the victim and also evade the desktop lock monitor. As an example, the ransomware
may display the ransom note via video or audio �les rather than locking the desktop.
As we partially discussed, these approaches only make sense if the malware is able to
successfully encrypt user �les �rst. In this case, Unveil can identify those malicious
�lesystem access as discussed in Section 4.4.
We also believe that the current implementation of text extraction to detect desk-
top locker ransomware can be improved. We observed that the change in the structure
of the desktop screen-shots is enough to detect a large number of current ransomware
attacks since Unveil exploits the attacker's goal which is to ensure that the victims
see the ransom note. However, we believe that the text extraction module can be
improved to detect possible evasion techniques an attacker could use to generate the
ransom note (e.g., using uncommon words in the ransom text).
Clearly, there is always the possibility that an attacker will be able to �ngerprint
the dynamic analysis environment. For example, stalling code [61] has become in-
creasingly popular to prevent the dynamic analysis of a sample. Such code takes
longer to execute in a virtual environment, preventing execution from completing
during an analysis. Also, attackers can actively look for signs of dynamic analysis
(e.g., signs of execution in a VM such as well-known hard disk names). Note that
89
Unveil is agnostic as to the underlying dynamic analysis environment. Hence, as a
mitigation, Unveil can use a sandbox that is more resistant to these evasion tech-
niques(e.g., [61, 105]). The main contribution of Unveil is not the dynamic analysis
of malware, but rather the introduction of new techniques for the automated, speci�c
detection of ransomware during dynamic analysis.
Unveil runs within the kernel, and aims to detect user-level ransomware. As a
result, there is the risk that ransomware may run at the kernel level and thwart some
of the hooks Unveil uses to monitor the �lesystem. However, this would require
the ransomware to run with administrator privileges to load kernel code or exploit a
kernel vulnerability. Currently, most ransomware runs as user-level programs because
this is su�cient to carry out ransomware attacks. Kernel-level attacks would require
more sophistication, and would increase the di�culty bar for the attackers. Also, if
additional resilience is required, the kernel component of Unveil could be moved
outside of the analysis sandbox.
3.7 Related Work
Many approaches have been proposed to date that have aimed to improve the analysis
and detection of malware. A number of approaches have been proposed to describe
program behavior from analyzing byte patterns [68, 100, 95, 108] to transparently
running programs in malware analysis systems [5, 58, 57, 104]. Early steps to ana-
lyze and capture the main intent of a program focused on analysis of control �ow.
For example, Kruegel et al. [65] and Bruschi et al. [25] showed that by modeling
programs based on their instruction-level control �ow, it is possible to bypass some
forms of obfuscation. Similarly, Christodorescu et al. [32] used instruction-level con-
90
trol �ow to design obfuscation-resilient detection systems. Later work focused on an-
alyzing and detecting malware using higher-level semantic characterizations of their
runtime behavior derived from sequences of system call invocations and OS resource
accesses [59, 60, 31, 72, 98, 109].
Similar to our use of automatically-generated user content, decoys have been used
in the past to detect security breaches. For instance, the use of decoy resources has
been proposed to detect insider attacks [24, 112]. Recently, Juels et al. [50] used
honeywords to improve the security of hashed passwords. The authors show that
decoys can improve the security of hashed passwords since the attempt to use the
decoy password for logins results in an alarm. In other work, Nikiforakis et al. [83]
used decoy �les to detect illegally obtained data from �le hosting services.
There have also been some recent reports on the ransomware threat. For exam-
ple, security vendors have reported on the threat of potential of ransomware attacks
based on the number of infections that they have observed [103, 11, 101, 86]. A �rst
report on speci�c ransomware families was made by Gazet where the author analyzed
three ransomware families including Krotten and Gpcode [43]. The author concluded
that while these early families were designed for massive propagation, they did not
ful�ll the basic requirements for mass extortion (e.g., su�ciently long encryption
keys). Recently, Kharraz et al. [56] analyzed 15 ransomware families and provided an
evolution-based study of ransomware attacks. They performed an analysis of charging
methods and the use of Bitcoin for monetization. They proposed several high-level
mitigation strategies such as the use of decoy resources to detect suspicious �le access.
Their assumption is that every �lesystem access to delete or encrypt decoy resources
is malicious and should be reported. However, they did not implement any concrete
91
solution to detect or defend against these attacks.
We are not aware of any systems that have been proposed in the literature that
speci�cally aim to detect ransomware in the wild. In particular, in contrast to existing
work on generic malware detection, Unveil detects behavior speci�c to ransomware
(e.g., desktop locking, patterns of �lesystem accesses).
3.8 Conclusions
In this chapter we presented Unveil, a novel approach to detecting and analyzing
ransomware. Our system is the �rst in the literature to speci�cally identify typical
behavior of ransomware such as malicious encryption of �les and locking of user
desktops. These are behaviors that are di�cult for ransomware to hide or change.
The evaluation of Unveil shows that our approach was able to correctly detect
13,637 ransomware samples from multiple families in a real-world data feed with
zero false positives. In fact, Unveil outperformed all existing AV scanners and a
modern industrial sandboxing technology in detecting both super�cial and technically
sophisticated ransomware attacks. Among our �ndings was also a new ransomware
family that no security company had previously detected before we submitted it to
VirusTotal.
92
Chapter 4
Protecting End-Points from
Ransomware Attacks
4.1 Introduction
Ransomware continues to be one of the most important security threats on the In-
ternet. While ransomware is not a new concept (such attacks have been in the wild
since the last decade), the growing number of high-pro�le ransomware attacks [12,
29, 34, 44] has resulted in increasing concerns on how to defend against this class of
malware. In 2016, several public and private sectors including the healthcare industry
were impacted by ransomware [22, 13, 107]. Recently, US o�cials have also expressed
their concerns about ransomware [38, 49], and even asked the U.S. government to
focus on �ghting ransomware under the Cybersecurity National Action Plan [49].
In response to the increasing ransomware threat, users are often advised to cre-
ate backups of their critical data. Certainly, having a reliable data backup policy
minimizes the potential costs of being infected with ransomware, and is an impor-
93
tant part of the IT management process. However, the growing number of paying
victims [15, 81, 40] suggests that unsophisticated users � who are the main target
of these attacks � do not follow these recommendations, and easily become a paying
victim of ransomware. Hence, ransomware authors continue to create new attacks
and evolve their creations as evidenced by the emergence of more sophisticated ran-
somware every day [103, 11, 101, 86].
Law enforcement agencies and security �rms have recently launched a program
to assist ransomware victims in retrieving their data without paying ransom fees to
cybercriminals [84]. The main idea behind this partnership is that reverse engineers
analyze the cryptosystems used by the malware to extract secret keys or �nd design
�aws in the way the sample encrypts or deletes �les. While there are ransomware
families that are infamous for using weak cryptography [56, 28, 67], newer ransomware
variants, unfortunately, have learned from past mistakes by relying on strong crypto-
graphic primitives provided by standard cryptographic libraries. In response to the
increasing number of ransomware attacks, a desirable and complementary defense
would be to augment the operating system with transparent techniques that would
make the operating system resistant against ransomware-like behavior. However, an
endpoint approach to defend against unknown ransomware attacks would need to
immediately stop attacks once the ransomware starts destroying �les, and should be
able to recover any lost data.
This work presents a generic, real-time ransomware protection approach to over-
come the limitations of existing approaches with regard to detecting ransomware.
Our technique is based on two main components: First, an abstract characterization
of the behavior of a large class of current ransomware attacks is constructed. More
94
precisely, our technique applies the results of a long-term dynamic analysis to binary
objects to determine if a process matches the abstract model. A process is labeled as
malicious if it exhibits behaviors that match the abstract model. Second, Redemp-
tion employs a high-performance, high-integrity mechanism to protect and restore
all attacked �les by utilizing a transparent data bu�er to redirect access requests
while tracking the write contents.
In this work, we demonstrate that by augmenting the operating system with a set
of lightweight and generic techniques, which we collectively call Redemption, it is
possible to stop modern ransomware attacks without changing the semantics of the un-
derlying �le system's functionality, or performing signi�cant changes in the architec-
ture of the operating system. Our experiments on 29 contemporary ransomware fam-
ilies show that our approach can be successfully applied in an application-transparent
manner, and can signi�cantly enhance the current protection capabilities against ran-
somware (achieving a true positive [TP] rate of 100% at 0.8% false positives [FPs]).
Finally, we show that this goal can be achieved without a discernible performance
impact, or other changes to the way users interact with standard operating systems.
To summarize, we make the following contributions.
� We present a general approach to defending against unknown ransomware at-
tacks in a transparent manner. In this approach, access to user �les is mediated,
and privileged requests are redirected to a protected area, maintaining the con-
sistent state of user data.
� We show that e�cient ransomware protection with zero data loss is possible.
� We present a prototype implementation for Windows, and evaluate it with real
95
users to show that the system is able to protect user �les during an unknown
ransomware attack while imposing no discernible performance overhead.
The rest of the chapter is structured as follows. Section 4.2 presents related
work. In Section 4.3, we present the threat model. In Section 4.4, we elaborate
on the architecture of Redemption. In Section 4.6, we provide more details about
the implementation of the system. In Section 4.7, we present the evaluation results.
Limitations of the approach are discussed in Section 4.8. Finally, Section 4.9 concludes
the chapter.
4.2 Related Work
The �rst scienti�c study on ransomware was performed by Gazet [43] where he an-
alyzed three ransomware families and concluded that the incorporated techniques in
those samples did not ful�ll the basic requirements for mass extortion. The recent
resurgence of ransomware attacks has attracted the attention of several researchers
once more. Kharraz et al. [56] analyzed 15 ransomware families including desktop
locker and cryptographic ransomware, and provided an evolution-based study on ran-
somware attacks. The authors concluded that a signi�cant number of ransomware in
the wild has a very similar strategy to attack user �les, and can be recognized from
benign processes. In another work, Kharraz et al. [53] proposed Unveil, a dynamic
analysis system, that is speci�cally designed to assist reverse engineers to analyze the
intrinsic behavior of an arbitrary ransomware sample. Unveil is not an end-point so-
lution and no real end-user interaction was involved in their test. Redemption is an
end-point solution that aims di�erentiate between benign and malicious ransomware-
like access requests to the �le system.
96
Scaife et al. [85] proposed CryptoDrop which is built upon the premise that the
malicious process aggressively encrypts user �les. In the paper, as a limitation of
CryptoDrop, the authors state that the tool does not provide any recovery or minimal
data loss guarantees. Their approach is able to detect a ransomware attack after a
median of ten �le losses. Redemption does not have this limitation as it is designed
to protect the consistent state of the original �les by providing full data recovery if
an attack occurs. Hence, unlike CryptoDrop, Redemption guarantees minimal data
loss and is resistant to most of realistic evasion techniques that malware authors may
use in future.
Very recently, Continella et al. [35], and Kolodenker et al. [62] concurrently and in-
dependently proposed protection schemes to detect ransomware. Continella et al. [35]
proposed ShieldFS which has a similar goal to us. The authors also look at the
�le system layer to �nd typical ransomware activity. While ShieldFS is a signi�cant
improvement over the status quo, it would be desirable to complement it with a
more generic approach which is also resistant to unknown cryptographic functions.
Unlike ShieldFS, Redemption does not rely on cryptographic primitive identi�ca-
tion which can result in false positive cases. More importantly, this was a conscious
design choice to minimize the interference with the normal operation of processes,
minimize the risk of process crashes and avoid intrusive pop-up prompts which can
have noticeable usability side-e�ects.
Kolodenker et al. [62] proposed PayBreak which securely stores cryptographic
encryption keys in a key vault that is used to decrypt a�ected �les after a ransomware
attack. In fact, PayBreak intercepts calls to functions that provide cryptographic
operations, encrypts symmetric encryption keys, and stores the results in the key
97
vault. After a ransomware attack, the user can decrypt the key vault with his private
key and decrypt the �les without making any payments. The performance evaluation
of the system also shows that PayBreak imposes negligible overhead compared to a
reference platform. Similar to ShieldFS, PayBreak relies on identifying functions that
implement cryptographic primitives. As mentioned earlier, Redemption does not
depend on any hooking technique to identify cryptographic functions. Furthermore,
the detection accuracy of Redemption is not impacted by the type of packer a
ransomware family may use to evade common anti-malware systems. This makes
Redemption a more generic solution to the same problem space.
The evaluation of Redemption covers a signi�cantly larger number of ran-
somware families compared to [35, 85] and shows it can successfully identify unseen
ransomware attacks after observing a median of �ve exposed �les without any data
loss. Indeed, Redemption shares some similarity with CryptoDrop, ShieldFS, and
PayBreak due to the common characteristics of ransomware attacks. However, ex-
tracting such behavior of ransomware is not the main contribution of the work as they
have been comprehensively discussed in several security reports. Rather, Redemp-
tion is the introduction of a high performance, data loss free end-user protection
framework against ransomware that protects the consistent state of the entire user
space and can be used as an augmented service to the operating system. We are not
aware of any other scienti�c work on the protection against ransomware attacks.
4.3 Threat Model
In this work, we assume that ransomware can employ any standard, popular tech-
niques to attack machines similar to other types of malware. That is, ransomware
98
can employ several strategies to evade the detection phase, compromise vulnerable
machines, and attack the user �les. For example, a ransomware instance could be
directly started by the user, delivered by a drive-by download attack, or installed via
a simple dropper or a malicious email attachment.
We also assume that the malicious process can employ any techniques to gener-
ate the encryption key, use arbitrary encryption key lengths, or in general, utilize
any customized or standard cryptosystems to lock the �les. Ransomware can access
sensitive resources by generating new processes, or by injecting code into benign pro-
cesses (i.e., similarly to other classes of malware). Furthermore, we assume that a
user can install and run programs from arbitrary untrusted sources, and therefore,
that malicious code can execute with the privileges of the user. This can happen in
several scenarios. For instance, a user may install, execute and grant privileges to a
malicious application that claims to be a well-known legitimate application, but in
fact, delivers malicious payloads � including ransomware.
In addition, in this work, we also assume that the trusted computing base includes
the display module, OS kernel, and underlying software and hardware stack. There-
fore, we can safely assume that these components of the system are free of malicious
code, and that normal user-based access control prevents attackers from running ma-
licious code with superuser privileges. This is a fair assumption considering the fact
that ransomware attacks mainly occur in the user-mode.
4.4 Design Overview
In this section, we provide our design goals for Redemption. We refer the reader to
Section 4.6 for details of our prototype implementation. Redemption has two main
99
Redemption Monitor
1
2
6
5 4
3
1
2
Figure 4-1: Redemption mediates the access to the �le system and redirects each writerequest on the user �les to a protected area without changing the status of the original �le.
components. First, a lightweight kernel module that intercepts process interactions
and stores the event, and manages the changes in a protected area. Second, a user-
mode daemon, called behavioral monitor and noti�cation module, that assigns a
malice score to a process, and is used to notify the user about the potential malicious
behavior of a process.
Intercepting Access Requests. In order to implement a reliable dynamic access
control mechanism over user data, this part of the system should be implemented in
the kernel, and be able to mediate the access to the �le system. The prototype redi-
rects each write access request to the user �les to a protected area without changing
the status of the original �le. We explain more details on how we implemented the
write redirection semantics in Section 4.6.
Figure 1 presents an example that illustrates how access requests are processed.
In an unmodi�ed system, the request would succeed if the corresponding �le exists,
and as long as the process holds the permission. The system introduces the following
changes. (1) Redemption receives the request A from the application X to access
the �le F at the time t, (2) if At requests access with write or delete privilege to the �le
100
F , and the �le F resides in a user de�ned path, the Redemption's monitor is called,
(3) Redemption creates a corresponding �le in the protected area, called re�ected
�le, and handles the write requests. These changes are periodically �ushed to the
storage to ensure that they are physically available on the disk. The meta-data entry
of the corresponding �le is updated with the o�set and length of the data bu�er in the
I/O request after a successful data write at Step 3. (4) the malice score of the process
is updated, and is compared to a pre-con�gured threshold α. (5) the Redemption
monitor sends a noti�cation to the display monitor to alert the user depending on the
calculated malice score. (6) a success/failure noti�cation is generated, and is sent to
the system service manager.
Data Consistency. An important requirement for Redemption is to be able to
guarantee data consistency during the interaction of applications with the �le system.
A natural question that arises here is what happens if the end-user con�rms that the
suspicious operations on the �le that was detected by the system are in fact benign.
In this case, having a consistency model is essential to protect the benign changes to
the user �les without on-disk data corruption. The implementation of the consistency
policy should maintain the integrity properties the applications desire from the �le
system. Failure to do so can lead to corrupted application states and catastrophic data
loss. For this reason, the system does not change the �le system semantics that may
a�ect the crash guarantees that the �le system provides. To this end, Redemption
operates in three steps: (1) it reads the meta-data generated for the re�ected �le,
and creates write requests based on the changed data blocks, and changes the status
of these blocks to committed, (2) upon receiving the con�rmation noti�cation, the
system updates the meta-data of the re�ected �le from committed to con�rmed, and
101
(3) the re�ected �le is deleted from the protected area. Figure 4-2 brie�y illustrates
the steps involved in commiting the changes to the user data.
User Space
Kernel Space
SYSTEM SERVICES
REDEMPTION MONITOR
2
FILESYSTEMDRIVER
DISK
PROTECTEDAREA
Reflected Files
Offsets
Buffer Length
Metadata
Reading metadata of Reflected files
1
Creating requests and committing the changesto original files
3 Returning the result
Figure 4-2: The steps involved in commiting the benign changes to the �les.
Another question that arises here is how the system protects the consistency of
the original �le during the above-mentioned three-steps procedure if a system crash
occurs. In case of a crash, the system works as follows: (1) if data is committed (Step
1), but the corresponding meta-data is not updated (Step 2), the system treats the
change as incomplete, and discards the change as a rollback of an incomplete change.
This operation means that Step 2 is partially completed before a crash, so the system
repeats the Step 1, (2) If the meta-data of the re�ected �le is updated to con�rmed,
it means that the benign changes to the �le has been successfully committed to the
original �le. In this case, the re�ected �le is removed from the protected area. Note
that a malicious process may attack the Malice Score Calculation (MSC) function
102
by trying to keep the malice score of the process low while performing destructive
changes. We elaborate more on these scenarios in Section 4.8.
User Noti�cation. The trusted output that Redemption utilizes is a visual alert
shown whenever a malicious process is detected. We have designed the alert messages
to be displayed at the top of the screen to be easily noticeable. Since benign appli-
cations usually require sophisticated inputs (i.e., clicking on speci�c buttons, �lling
out the path prompt) from the user before performing any sensitive operation on the
�les, the user is highly likely to be present and interacting with the computer, making
it di�cult for her to miss an alert.
4.5 Detection Approach
As mentioned earlier, an important component of Redemption is to perform system-
wide application monitoring. For each process that requires privileged access to user
�les, we assign a malice score. The malice score of a process represents the risk
that the process exhibits ransomware behavior. That is, the malice score determines
whether the Redemption monitor should allow the process to access the �les, or
notify the user. In the following, we explain the features we used to calculate the
malice score of a process. The features mainly target content-based (i.e., changes
in the content of each �le) and behavior-based (i.e., cross-�le behavior of a process)
characteristics of ransomware attacks.
4.5.1 Content-based Features
Entropy Ratio of Data Blocks. For every read and write request to a �le, Re-
demption computes the entropy [69] of the corresponding data bu�ers in the I/O
103
traces similar to [53]. Comparing the entropy of read and write requests to and from
the same �le o�set serves as an excellent indicator of ransomware behavior. This is
due to the popular strategy of reading in the original �le data, encrypting it, and
writing the encrypted version.
File Content Overwrite. Redemption monitors how a process requests write
access to data blocks. In a typical ransomware attack, in order to minimize the
chance of recovering �les, the malicious process overwrites the content of the user
�les with random data. Our system increases the malice score of a process as the
process requests write access to di�erent parts of a �le. In fact, a process is assigned
a higher malice score if it overwrites all the content of the �les.
Delete Operation. If a process requests to delete a �le that belongs to the end-user,
it receives a higher malice score. Ransomware samples may not overwrite the data
block of the user �les directly, but rather generate an encrypted version of the �le,
and delete the original �le.
4.5.2 Behavior-based Features
Directory Traversal. During an attack, the malicious process often arbitrarily lists
user �les, and starts encrypting the �les with an encryption key. A process receives
a higher malice score if it is iterating over �les in a given directory. Note that a
typical benign encryption or compression program may also iterate over the �les in
a directory. However, the generated requests are usually for reading the content of
the �les, and the encrypted or compressed version of the �le is written in a di�erent
path. The intuition here is that the ransomware usually intends to lock as many �les
as possible to force the victim to pay.
104
Converting to a Speci�c File Type. A process receives a higher malice score
if it converts �les of di�ering types and extensions to a single known or unknown
�le type. The intuition here is that in many ransomware attacks, unlike most of the
benign applications that are speci�cally designed to operate on speci�c types of �les,
the malicious process targets all kinds of user �les. To this end, Redemption logs if
a process requests access to widely varying classes of �les (i.e., videos, images, docu-
ments). Note that accessing multiple �les with di�erent extensions is not necessarily
malicious. Representative examples include the media player to play .mp3 �les (au-
dio) as well as .avi (video) �les. However, such applications typically open the �les
with read permission, and more importantly, only generate one request in a short
period of time since the application requires speci�c inputs from the user. Hence,
the key insight is that a malicious ransomware process would overwrite or delete the
original �les.
Access Frequency. If a process frequently generates write requests to user �les,
we would give this process a higher malice score. We monitor δ � the time between
two consequent write access requests on two di�erent user �les. Our intuition is
that ransomware attacks programmatically list the �les and request access to �les.
Therefore, the δ between two write operations on two di�erent �les is not very long �
unlike benign applications that usually require some input from the user �rst in order
to perform the required operation.
4.5.3 Evaluating the Feature Set
Indeed, the assumption that all the features are equally important hardly holds true
in real world scenarios. Therefore, we performed a set of measurements to relax this
105
assumption. We used Recursive Feature Elimination (RFE) approach to determine
the signi�cance of each feature. To this end, the analysis started by incorporating
all the features and measuring the FP and TP rates. Then, in each step, a feature
with the minimum weight was removed and the FP and TP rates were calculated by
performing 10 fold cross-validation to quantify the contribution of each feature. The
assigned weights were then used as the coe�cient of the feature in the formula 4.1 in
Section 4.5.4.
Our experiments on several combinations of features shows that the highest false
positive rate is 5.9%, and is produced when Redemption only incorporates content-
based features (F1). The reason for this is that �le compression applications, when
con�gured to delete the original �les, are reported as false positives. During our
experiments, we also found out that in document editing programs such as Microsoft
Powerpoint or Microsoft Paint, if the user inserts a large image in the editing area,
the content-based features that monitor content traversal or payload entropy falsely
report the application as being anomalous. However, when behavior-based features
were incorporated, such programs do not receive a high anomaly score since there is
no cross-�le activities with write privilege similar to ransomware attacks. When all
the features are combined (i.e., F12), the minimum false positive rate (0.5% FP with
100% TPs) is produced on labeled dataset. Hence, we use the combination of all the
features in our system.
4.5.4 Malice Score Calculation (MSC) Function
The MSC function allows the system to identify the suspicious process and notify the
user when the process matches the abstract model. Given a process X, we assign a
106
malice score S to the process each time it requests privileged access to a user �le. If
the malice score S exceeds a pre-de�ned malice threshold α, it means that the process
exhibits abnormal behaviors. Hence, we suspend the process and inform the user to
con�rm the suspicious action. In the following, we provide more details on how we
determine the malice score for each process that requests privileged operations on
user �les:
(r1): The process that changes the entropy of the data blocks between a read and
a write request to a higher value receives a higher malice score. The required value
is calculated as an additive inverse of the entropy value of read and write ratio, and
resides on [0,1], meaning that the higher the value of entropy in the write operation,
the closer the value of the entropy to 1. If the entropy of the data block in write is
smaller than the read operation, we assign the value 0 to this feature.
(r2): If a process iterates over the content of a �le with write privilege, it will receive
a higher malice score. If the size of the �le A is sA, and yA is the total size of the
data blocks modi�ed by the process, the feature is calculated as yAsA
where the higher
the number of data blocks modi�ed by the process, the closer the value is to 1.
(r3): If a process requests to delete a �le, this behavior is marked as being suspicious.
If a process exhibits such I/O activities, the value 1 is assigned to r3.
(r4): Redemption monitors if the process traverses over the user �les with write
privilege, and computes the additive inverse of the number of privileged accesses to
unique �les in a given path. The output of the function resides on [0,1]. Given
a process X, the function assigns a higher malice score as X generates more write
requests to access �les in a given path. Here, write(X, fi) is the ith independent write
request generated by the process X on a given �le fi.
107
(r5): Given a set of document classes, Redemption monitors whether the process
requests write access to �les that belong to di�erent document classes. The �le A and
�le B belong to two di�erent document classes if the program that opens �le A cannot
take �le B as a valid input. For example, a docx and a pdf �le belong to two di�erent
document classes since a docx �le cannot be opened via a PDF editor program. We
assign the score 1 if the process performs cross-document access requests similar to
ransomware.
(r6): The system computes the elapsed time (δ) between two subsequent write re-
quests generated by a single process to access two di�erent �les. 1δrepresents the
access frequency. As the elapsed time between two write requests increases, the ac-
cess frequency decreases.
We de�ne the overall malice score of a process at time t by applying the weights of
individual features:
MSC(r) =
k∑i=1
wi × ri
k∑i=1
wi
(4.1)
where wi is the prede�ned weight for the feature i in the MSC function. The value
of wi is based on the experiment discussed in Section 4.5.3. The weights we used in
(1) are w1 = 0.9, w2 = 1.0, w3 = 0.6, w4 = 1.0, w5 = 0.7, w6 = 1.0.
Note that when Redemption is active, even when using all the combined fea-
tures, �le encryption or secure deletion applications are typically reported as being
suspicious. As mentioned earlier, such applications generate very similar requests to
access user �les as a ransomware does. For example, in a secure deletion application,
the process iterates over the entire content of the given �le with write privileges, and
108
writes random payloads on the contents. The same procedure is repeated over the
other �les in the path. Hence, such cases are reported to the user as violations, or
other inappropriate uses of their critical resources.
4.6 Implementation
In this section, we provide the implementation details of Redemption. Note that
our design is su�ciently general to be applied to any OS that is a potential target for
ransomware. However, we built our prototype for the Windows environment which
is the main target of current ransomware attacks today.
Monitoring Access Requests. Redemption must interpose on all privileged ac-
cesses to sensitive �les. The implementation of the system is based on the Windows
Kernel Development framework without any modi�cations on the underlying �le sys-
tem semantics. To this end, it su�ces on Windows to monitor the write or delete
requests from the I/O system to the base �le system driver. Furthermore, to guaran-
tee minimal data loss, Redemption redirects the write requests from the user �les
to the corresponding re�ected �les. The re�ected �les are implemented via sparse
�les on NTFS. In fact, the NTFS �le system does not allocate hard disk drive space
to re�ected �les except in regions where they contain non-zero data. When a process
requests to open a user �le, a sparse �le with the same name is created/opened in the
protected area. The sparse �les are created by calling the function FltFsControlFile
with the control code FSCTL_SET_SPARSE. The size of the �le is then set by calling
FltSetInformationFile that contains the size of the original �le.
Redemption updates the FileName �eld in the �le object of the create request
with the sparse �le. By doing this, the system redirects the operation to the re�ected
109
�le, and the corresponding handle is returned to the requesting process. The write
request is executed on the �le handle of the re�ected �le which has been returned to
the process at the opening of the �le. Each write request contains the o�set and the
length of the data block that the process wishes to write the data to.
If the write request is successfully performed by the system, the corresponding
meta-data of the re�ected �le (which is the o�set and the length of the modi�ed
regions of the original �le) is marked in the write requests. In our prototype, the
meta-data entry to represent the modi�ed regions is implemented via Reparse Points
provided by Microsoft � which is a collection of application-speci�c data � and is
interpreted by Redemption that sets the tags. When the system sets a reparse
point, a unique reparse tag is associated with it which is then used to identify the
o�set and the length of every change. The reparse point is set by calling FltTagFile
when the �le is created by Redemption. On subsequent accesses to the �le in the
protected area, the reparse data is parsed via FltFsControlFile with the appropriate
control code (i.e., FSCTL_GET_REPARSE_POINT). Hence, the redirection is achieved by
intercepting the original write request, performing the write, and completing the
original request while tracking the write contents.
The consistency of the data redirected to the sparse �les is an important design re-
quirement of the system. Therefore, it is required to perform frequent �ushing to avoid
potential user data loss. Indeed, this approach is not without a cost as multiple write
requests are required to ensure critical data is written to persistent media. To this end,
we use the Microsoft recommended approach by opening sparse �les for unbu�ered
I/O upon creation and enabling write-through caching via FILE_FLAG_NO_BUFFERING
and FILE_FLAG_WRITE_THROUGH �ags. In fact, with write-through caching enabled,
110
data is still written into the cache, but cache manager writes the data immediately
to disk rather than incurring a delay by using the lazy writer. Windows recommends
this approach as replacement for calling the FlushFileBuffer function after each
write which usually causes unnecessary performance penalties in such applications.
Behavioral Detection and Noti�cation Module. We implemented this module
as a user-mode service. This was a conscious design choice similar to the design of
most anti-malware solutions. Note that Microsoft o�cially supports the concept of
protected services, called Early Launch Anti-Malware (ELAM), to allow anti-malware
user-mode services to be launched as protected services. In fact, after the service is
launched as a protected service, Windows uses code integrity to only allow trusted
code to load into a protected service. Windows also protects these processes from code
injection and other attacks from admin processes [79]. If Redemption identi�es the
existence of a malicious process, it automatically terminates the malicious process.
4.7 Evaluation
The prototype of the Redemption supports all Windows platforms. In our experi-
ments, we used Windows 7 by simply attaching Redemption to the �le system. We
took popular anti-evasion measures similar to our experiments in Chapter 3. The
remainder of this section discusses how benign and malicious dataset were collected,
and how we conducted the experiments to evaluate the e�ectiveness of our approach.
4.7.1 Dataset
The ground truth dataset consists of �le system traces of manually con�rmed ran-
somware samples as well as more than 230 GB of data which contains the interaction
111
of benign processes with �le system on multiple machines. We used this dataset to
verify the e�ectiveness of Redemption, and to determine the best threshold value
to label a suspicious process.
Collecting Ransomware Samples. We collected ransomware samples from public
repositories [1, 4] that are updated on a daily basis, and online forums that share
malware samples [3, 71]. In total, we collected 9,432 recent samples, and we con�rmed
1174 of them to be active ransomware from 29 contemporary ransomware families. We
used 504 of the samples from 12 families in our training dataset. Table 4.2 describes
the dataset we used in this experiment.
Collecting Benign Applications. One of the challenges to test Redemption
was to collect su�cient amount of benign data, which can represent the realistic
use of �le system, for model training purposes. To test the proposed approach with
realistic workloads, we deployed a version of Redemption on �ve separate Windows
7 machines in two di�erent time slots each for seven days collecting more that 230 GB
of data. The users of the machines were advised to perform their daily activities on
their machines. Redemption operated in the monitoring mode, and did not collect
any sensitive user information such as credentials, browsing history or personal data.
The collected information only included the interaction of processes with the �le
system which was required to model benign interaction with the �le system. All the
extracted data was anonymized before performing any further experiments. Based
on the collected dataset, we created a pool of application traces that consisted of
65 benign executables including applications that exhibit ransomware-like behavior
such as secure deletion, encryption, and compression. The application pool consisted
of document editors (e.g., Microsoft Word), audio/video editors (e.g., Microsoft Live
112
Movie Maker, Movavi Video Editor), �le compression tools (e.g., Zip, WinRAR), �le
encryption tools (e.g., AxCrypt, AESCrypt), and popular web browsers (e.g., Firefox,
Chrome). Due to space limitation, we provided a sub set of benign applications we
used in our analysis in Table 4.1.
4.7.2 Detection Results
As discussed in Section 4.4, one of the design requirements of the system is to produce
low false positives, and to minimize the number of unnecessary noti�cations for the
user. To this end, the system employs a threshold value to determine when an end-
user should be noti�ed about the suspicious behavior of a process.
We tested a large set of benign as well as ransomware samples on a Redemption
enabled machine. As depicted in Table 4.1 and Table 4.2, the median score of benign
applications is signi�cantly lower than ransomware samples. For �le encryption pro-
grams such as AxCrypt which are speci�cally designed to protect the privacy of the
users, the original �le is overwritten with random data once the encrypted version is
generated. In this case, Redemption reports the action as being malicious � which,
in fact, is a false positive. Unfortunately, such false positive cases are inevitable since
these programs are exhibiting the exact behavior that a typical ransomware exhibits.
In such cases, Redemption informs the end-user and asks for a manual con�rmation.
Given these corner cases, we select the malice score as α = 0.12 where the system
achieves the best detection and false positive rates (FPs = 0.5% at a TP = 100%).
Figure 4-3 represents the false positive and true positive rates as a function of the
malice score on the labeled dataset. This malice threshold is still signi�cantly lower
than the minimum malice score of all the ransomware families in the dataset as pro-
113
vided in Table 4.2. The table also shows the median �le recovery rate. As depicted,
Redemption detects a malicious process and successfully recovers encrypted data
after observing on average four �les. Our experiment on the dataset also showed that
7 GB storage is su�ciently large for the protected area in order to enforce the data
consistency policy.
Figure 4-3: TP/FP analysis of Redemption based on the best threshold value.
Testing with Known/Unknown Samples. In addition to the 10-fold cross vali-
dation on 504 samples, we also tested Redemption with unknown benign and ma-
licious dataset. The tests included 29 ransomware families which 57% of them were
not presented in the training dataset. We also incorporated the �le system traces of
benign processes in the second time slot as discussed in Section 4.7.1 as the unseen
benign dataset in this test. Table 4.3 represents the list of ransomware families we
114
used in our experiments. This table also shows the datasets that were used in prior
work [35, 85, 62]. In this experiment, we used the malice threshold α = 0.12 similar to
the previous experiment and manually checked the detection results to measure the
FP and TP rates. The detection results in this set of experiments is (TPs = 100%
at 0.8% FPs). Note that the number of FP cases depends on the value of malice
threshold. We selected this conservative value to be able to detect all the possible
ransomware behaviors. Indeed, observing realistic work loads on a larger group of
machines can lead to a more comprehensive model, more accurate malice threshold
calibration, and ultimately lower FP rates. However, our experiments on 677 ran-
somware samples from 29 ransomware families show that Redemption is able to
detect the malicious process in all the 29 families by observing a median of 5 �les.
We suspect the di�erence in the number of �les is due to di�erence in the size of the
�les being attacked. In fact, this is a very promising result since the detection rate of
the system did not change by adding unknown ransomware families which do not nec-
essarily follow the same attack techniques (i.e., using di�erent cryptosystems). The
results of this experiment also shows that the number of exposed �les to ransomware
does not change signi�cantly if Redemption is not trained with unseen ransomware
families. This result clearly implies that the system can detect a signi�cant number
of unseen ransomware attacks.
4.7.3 Disk I/O and File System Benchmarks
In order to evaluate the disk I/O and �le system performance of Redemption, we
used IOzone [9], a well-known �le system benchmark tool for Windows. To this end,
we �rst generated 100× 512 MB �les to test the throughput of block write, rewrite,
115
Table 4.1: A list of Benign application and their malice scores.
Program Min. Score Max. Score
Adobe Photoshop 0.032 0.088AESCrypt 0.37 0.72AxCrypt 0.31 0.75Adobe PDF reader 0.0 0.0Adobe PDF Pro 0.031 0.039Google Chrome 0.037 0.044Internet Explorer 0.035 0.045Matlab 0.038 0.92MS Words 0.041 0.089MS PowerPoint 0.025 0.102MS Excel 0.017 0.019VLC Player 0.0 0.0Vera Crypt 0.33 0.71WinRAR 0.0 0.16Windows Backup 0.0 0.0Windows paintit 0.029 0.083SDelete 0.283 0.638Skype 0.011 0.013Spotify 0.01 0.011Sumatra PDF 0.022 0.041Zip 0.0 0.16
Malice Score Median 0.027 0.0885
and read operations. Next, we tested the standard �le system operations by creating
and accessing 50,200 �les, each containing 1 MB of data in multiple directories. We
ran IOzone as a normal process. Then, for having a comparison, we repeated all
the experiments 10 times, and calculated the average scores to get the �nal results.
We wrote a script in AutoIt [8] to automate the tasks.The results of our �ndings are
summarized in Table 4.4.
The experiments show that Redemption performs well when issuing heavy reads
and writes, and imposes an overhead of 2.8% and 3.4%, respectively. However, rewrite
116
Table 4.2: A list of ransomware families and their malice scores.
Family Samples Min. Score Max. Score File Recovery
Cerber 33 0.41 0.73 5Cryptolocker 50 0.36 0.77 4CryptoWall3 39 0.4 0.79 6CryptXXX 46 0.49 0.71 3CTB-Locker 53 0.38 0.75 7CrypVault 36 0.53 0.73 3CoinVault 39 0.42 0.69 4Filecoder 54 0.52 0.66 5GpCode 45 0.52 0.76 2TeslaCrypt 37 0.43 0.79 4Virlock 29 0.51 0.72 3SilentCrypt 43 0.31 0.59 9
Total Samples 504 - - -Score Median - 0.43 0.73 -File RecoveryMedian
- - - 4
and create operations can experience slowdowns ranging from 7% to 9% when dealing
with a large number of small �les. In fact, creating the re�ected �les and redirecting
the write requests to the protected area are the main reasons of this performance
hit under high workloads. These results also suggest that Redemption might not
be suitable for workloads involving many small �les such as compiling large software
projects. However, note that such heavy workloads do not represent the deployment
cases Redemption is designed to target (i.e., protecting the end host of a typical
117
Family Redemption CryptoDrop [85] ShieldFS [35] PayBreak [62]Samples/FA Samples/FA Samples Samples
Almalocker - - - 1Androm - - - 2Cerber 30/6 - - 1Chimera - - - 1CoinVault 19/5 - - -Critroni 16/6 - 17 -Crowti 22/8 - - -CryptoDefense 42/7 18/6.5 6 -CryptoLocker(copycat) - 2/20 - -Cryptolocker 29/4 31/10 20 33CryptoFortess 12/7 2/14 - 2CryptoWall 29/5 8/10 8 7CrypWall - - - 4CrypVault 26/3 - - -CryptXXX 45/3 - - -CryptMIC 7/3 - - -CTB-Locker 33/6 122/29 - -DirtyDecrypt 8/3 - 3 -DXXD - - - 2Filecoder 34/5 72/10 - -GpCode 45/3 13/22 - 2HDDCryptor 13/5 - - -Jigsaw 12/4 - - -Locky 21/2 - 154 7MarsJokes - - - 1MBL Advisory 12/4 1/9 - -Petya 32/5 - - -PayCrypt - - 3 -PokemonGo - - - 1PoshCoder 17/4 1/10 - -TeslaCrypt 39/6 149/10 73 4Thor Locky - - - 1TorrentLocker 21/6 1/3 12 -Tox 15/7 - - 9Troldesh - - - 5Virlock 29/7 20/8 - 4Razy - - - 3SamSam - - - 4SilentCrypt 43/8 - - -Xorist 14/7 51/3 - -Ransom-FUE - 1/19 - -WannaCry 7/5 - - -ZeroLocker 5/8 - 1 -
Total Samples (Families) 677(29) 492(15) 305(11) 107(20)File Attacked/Recovered(FA/FR) Median 5/5 10/0 - -
Table 4.3: The ransomware families used to test Redemption and other proposed tech-niques.
user that surfs the web and engages in productivity activities such as writing text
and sending emails).
Another important question that arises here is that how many �les should be main-
tained in the protected area when Redemption is active. In fact, as the protected
area is su�ciently large, the system can maintain several �les without committing
them to the disk and updating the original �les. However, this approach may not
be desirable in scenarios where several read operations may occur immediately after
write operations (i.e., database). More speci�cally, in these scenarios, Redemption,
in addition to write requests, Redemption should also redirect read operations to
the protected area which is not ideal from usability perspective. To this end, we also
118
performed an I/O benchmarking on the protected area by requesting write access to
�les, updating the �les, and committing the changes to the protected area without
updating the original �les. We created a script to immediately generate read requests
to access updated �les. The I/O benchmark on the protected area shows that the
performance overhead for read operations is less than 3.1% when 100 �les with me-
dian �le size of 17.4 MB are maintained in the protected area. This number of �les
is signi�cantly larger than the maximum number of �les Redemption needs to ob-
serve to identify the suspicious process. Note that we consider the scenarios where
read operations are requested immediately after write operations to exercise the redi-
rection mechanism under high loads. Based on this performance benchmarking, we
conclude that read redirection mechanism does not impose a signi�cant overhead as
we �rst expected. In the following, we demonstrate that Redemption incurs min-
imal performance overhead when executing more realistic workloads for our target
audience.
4.7.4 Real-world Application Testing
To obtain measurable performance indicators to characterize the overhead of Re-
demption, we created micro-benchmarks that exercise the critical performance paths
of Redemption. Note that developing benchmarks and custom test cases requires
careful consideration of factors that might impact the runtime measurements. For
example, a major challenge we had to tackle was automating the testing of desktop
applications with graphical user interfaces. In order to perform the tests as identical
as possible on the standard and Redemption-enabled machines, we wrote scripts in
AutoIt to interact with each application while monitoring their performance impact.
119
Table 4.4: Disk I/O performance in a stan-dard and a Redemption-protected host.
OperationOriginal Redemption
Performance Performance Overhead(%)
Write 112,456.25 KB/s 110094.67KB/s 3.4%Rewrite 68,457.57 KB/s 62501.76 KB/s 8.7%Read 114,124.78 KB/s 112070.53 KB/s 2.8%Create 12,785 �les/s 11,852 �les/s 7.3%
Table 4.5: Runtime overhead of Redemp-tion on a set of end-point applications
Application Original (s) Redemption (s) Overhead (%)
AESCrypt 165.55 173.28 4.67%AxCrypt 182.4 191.72 5.11%Chrome 66.19 67.02 1.25%IE 68.58 69.73 1.67%Media Player 118.2 118.78 0.49%MS Paint 134.5 138.91 3.28%MS Word 182.17 187.84 3.11%SDelete 219.4 231.0 5.29%Vera Crypt 187.5 196.46 4.78%Winzip 139.7 141.39 1.21%WinRAR 160.8 163.12 1.44%zip 127.8 129.32 1.19%
Average - - 2.6%
To this end, we called the application within the script, and waited for 5 seconds for
the program window to appear. We then automatically checked whether the GUI
of the application is the active window. The script forced the control's window of
the application to be on top. We then started interacting with the edit control and
other parts of the programs to exercise the core features of the applications using
the handle returned by the AutoIt script. Similarly to the previous experiment, we
repeated each test 10 times. We present the average runtimes in Table 4.5.
In our experiments, the overhead of protecting a system from ransomware was
under 6% in every test case, and, on average, running applications took only 2.6%
longer to complete their tasks. These results demonstrate that Redemption is ef-
�cient, and that it should not detract from the user experience. These experiments
also support that Redemption can provide real time protection against ransomware
without a signi�cant performance impact. We must stress that if Redemption is de-
ployed on machines with a primarily I/O bound workload, lower performance should
be expected as indicated by the benchmark in Section 4.7.3.
120
4.7.5 Usability Experiments
We performed a user study experiment with 28 participants to test the usability of
Redemption. We submitted and received IRB waiver for our usability experiments
from the o�ce of Human Subject Research Protection (HSRP). The goal of the us-
ability test is to determine whether the system provides transparent monitoring, and
also to evaluate how end-users deal with our visual alerts. The participants were from
di�erent majors at the authors' institution. Participants were recruited by asking for
volunteers to help test a security tool. In order to avoid the e�ects of priming, the
participants were not informed about the key functionality of Redemption. The
recruitment requirement was that the participants are familiar with text editors and
web browsers so that they could perform the given tasks correctly. All the experi-
ments were conducted using two identical Windows 7 virtual machines enabled with
Redemption on two laptops. The virtual machines were provided a controlled In-
ternet access as described in Section 4.7. Redemption was con�gured to be in the
protection mode on the entire data space generated for the test user account. A ran-
somware sample was automatically started at a random time to observe how the user
interacts with Redemption during a ransomware attack. After each experiment, the
virtual machines were rolled back to the default state. No personal information was
collected from the participants at any point of the experiments.
We asked the participants to perform three tasks to evaluate di�erent aspects
of the system. The �rst task was to work with an instance of Microsoft Word and
PowerPoint on the test machines running Redemption. The experiment observer
asked the participants to compare this process with their previous experience of using
Microsoft Word and PowerPoint and rate the di�culty involved in interacting with
121
the test setup on a 5-point Likert scale.
In the second task, the participants were asked to encrypt a folder containing mul-
tiple �les with AxCrypt on the Redemption-enabled machine. This action caused
a visual alert to be displayed to the participant that the operation is suspended, and
ask the user to con�rm or deny the action. The participants were asked to explain
why they con�rmed or denied the action and the reason behind their decision.
In the last task, the participants were asked to perform a speci�c search on the
Internet. While they were pre-occupied with the task, the ransomware sample was
automatically started. This action was blocked by Redemption and caused another
visual alert to be displayed. Similar to the second task, the experiment observer
monitored how participants handled the alert.
At the end of the �rst phase of the experiment, all 28 participants found the expe-
rience to be identical to using Microsoft Word and PowerPoint on their own machines.
This �nding empirically con�rms that Redemption is transparent to the users. In
the second experiment, 26 participants con�rmed the action. Another 2 noticed the
alert, but denied the operation so no �le was encrypted. In the third phase, all the 28
participants noticed the visual alert, and none of the users con�rmed the operation.
The participants explained that they were not sure why they received this visual alert,
and could not verify the operation. These results con�rm that Redemption visual
alerts are able to draw all participants' attention while they are occupied with other
tasks, and are e�ective in protecting the user data. Furthermore, the experiments
clearly imply that end-users are more likely to recognize the presence of suspicious
operations on their sensitive data using Redemption indicators. To con�rm statis-
tical signi�cance, we performed a hypothesis test where the null hypothesis is that
122
Redemption's indicators do not assist in identifying suspicious operations during
ransomware attacks, while the alternative hypothesis is that Redemption's ran-
somware indicators do assist in identifying such destructive actions. Using a paired
t-test, we obtain a p-value of 4.9491× 10−7, su�cient to reject the null hypothesis at
a 1% signi�cance level.
4.8 Discussion and Limitations
Unfortunately, malware research is an arms race. Therefore, there is always the possi-
bility that malware developers �nd heuristics to bypass the detection on the analysis
systems, or on end-user machines. In the following, we discuss possible evasion sce-
narios that can be used by malware authors, and how Redemption addresses them.
Attacking Redemption's Monitor.
Note that the interaction of any user-mode process as well as kernel mode drivers
with the �le system is managed by Windows I/O manager which is responsible for
generating appropriate I/O requests. Since every access in any form should be �rst
submitted to the I/O manager, and Redemption registers callbacks to all the I/O
requests, bypassing Redemption's monitor is not possible in the user-mode. Fur-
thermore, note that direct access to the disk or volume is prohibited by Windows
from Windows Vista [76] for user-mode applications in order to protect �le system's
integrity. Therefore, any other form of requests to access the �les is not possible in
the user-mode, and is guaranteed by the operating system.
Attackers may be able to use social engineering techniques and frustrate users by
creating fake alert messages � accusing a browser to be a ransomware � and forcing
the user to turn o� Redemption. We believe these scenarios are possible. However,
123
note that such social engineering attacks are well-known security problems and target
all end-point security solutions including our tool. Defending against such attacks
depends more on the security awareness of users and is out of scope of this work.
Attacking the Malice Score Calculation Function.
An attacker may also target the malice calculation function, and try to keep the
malice score of the process lower than the threshold. For example, an attacker can
generate code that performs selective content overwrite, use a low entropy payload
for content overwrite, or launch periodic �le destruction. If an attacker employs any
one of these techniques by itself, the malice score becomes lower, but the malicious
action would still be distinguishable. For example, if the �le content is overwritten
with low entropy payload, the process receives a lower malice score. However, since
the process overwrites all the content of a �le with a low-entropy payload, it is itself
suspicious, and would be reported to the user.
We believe that the worst case scenario would be if an attacker employs all the
three techniques simultaneously to bypass the malice score calculation function. This
is a fair assumption since developing such a malware is straightforward. However, note
that in order to launch a successful ransomware attack, and force the victim to pay
the ransom fee, the malicious program needs to attack more than a �le � preferably
all the �les on the system. Hence, even if the malicious program employs all of the
bypassing techniques, it requires some sort of iteration with write permission over
the user �les. This action would still be seen and captured by Redemption. In this
particular case, a malicious program can successfully encrypt a single user �le, but
the subsequent write attempt on another �le would be reported to the user for the
con�rmation if the write request occurs within a pre-de�ned six hour period after the
124
�rst attempt. This means a ransomware can successfully encrypt a user �le every six
hours. We should stress that, in this particular scenario, the system cannot guarantee
zero data loss. However, the system signi�cantly decreases the e�ectiveness of the
attack since the number of �les encrypted per day is very small.
Furthermore, since these approaches incur a considerable delay to launch a suc-
cessful attack, they also increase the risk of being detected by AV scanners on the
end-point before encrypting a large number of �les, and forcing the user to pay. Con-
sequently, developing such stealthy ransomware may not be as pro�table as current
ransomware attack strategies where the entire point of the attack is to encrypt as
many �les as possible in a short period of time and request money. An attacker may
also avoid performing user �le encryption, and only lock the desktop once installed.
This approach can make the end-user machine inaccessible. However, such changes
are not persistent, and regaining access to the machine is signi�cantly easier, and is
out of the scope of this work.
4.9 Conclusions
In this work, we proposed a generic approach, called Redemption, to defend against
ransomware on the end-host. We show that by incorporating the prototype of Re-
demption as an augmented service to the operating system, it is possible to suc-
cessfully stop ransomware attacks on end-user machines. We showed that the system
incurs modest overhead, averaging 2.6% for realistic workloads. Furthermore, Re-
demption does not require explicit application support or any other preconditions
to actively protect users against unknown ransomware attacks. We provide an anony-
mous video of Redemption in action in [14], and hope that the concepts we propose
125
will be useful for end-point protection providers.
126
Chapter 5
Conclusions
Malware research is an adversarial �eld. Adversaries strive for developing new tech-
niques to evade common detection techniques and successfully run their attacks. The
evolving nature of these attacks require the security community to constantly monitor
new trends and techniques used by attackers, and develop novel defense mechanism.
However, to ensure that the proposed solutions can be successfully incorporated in
the defense side, the proposed techniques should enhance the modern detection tech-
niques, improve anti-evasion capabilities of the current solutions while maintaining
usability and imposing low overhead. As most of today's ransomware attacks mainly
target end-users, any anti-ransomware technique needs to satisfy these requirements
in order to achieve widespread adoption either as an end-point solution or a tool for
research and analysis purposes.
In this thesis, in light of the above considerations, our proposed solutions are
designed to satisfy these requirements. We illustrated that our proposed solutions can
enhance the state-the-of-the-art techniques while being resistant to common evasion
techniques and imposing low overhead to the underlying systems.
127
In Chapter 2, we looked at 1,359 current ransomware samples from 15 ransomware
families by monitoring how these attacks target users. We developed a kernel driver
that can be attached to the �lesystem, and monitor how a malicious process tar-
gets user data. Our analysis showed that, unlike most of the security reports on
these attacks, the malicious payloads work very similarly among di�erent classes of
ransomware attacks, and developing a defense mechanism against these attacks is
possible.
In Chapter 3, we investigated the challenges to analyze ransomware attacks, and
developed an analysis environment to detect and study unknown ransomware at-
tacks. We proposed Unveil, a sandbox designed speci�cally to detect and analysis
ransomware, which assists reverse engineers to get intrinsic insights of the internal
behavior of ransomware attacks.
In Chapter 4, discuss the shortcomings of current solutions to protect end-users
from ransomware attacks, and proposed Redemption, an end-point solution, to
detect ransomware attacks while achieving zero data loss. We provided a generic
system architecture and implemented a prototype for Microsoft Windows operating
system. We performed several experiments to show that Redemption meets the
requirements we discussed earlier in this chapter.
Acknowledgments The research presented in Chapter 2 is based on author's pre-
viously published work:
Amin Kharraz, William Robertson, Davide Balzarotti, Leyla Bilge, Engin Kirda,
Cutting the Gordian Knot: A Look Under the Hood of Ransomware Attacks, 12th
International Conference on Detection of Intrusions and Malware Vulnerability As-
sessment (DIMVA). Milan, Italy, 2015.
128
The research presented in Chapter 3 is based on author's previously published
paper:
Amin Kharraz, Sajjad Arshad, Collin Muliner, William Robertson, Engin Kirda,
UNVEIL: A Large-Scale, Automated Approach to Detecting Ransomware, USENIX
2016. Austin, Texas, August 2016.
The research presented in Chapter 4 is based on author's previously published
paper:
Amin Kharraz, Engin Kirda, Redemption: Real-time Protection Against Ransomware
at End-Hosts,The 20th International Symposium on Research on Attacks, Intrusions
and Defenses (RAID 2017). Atlanta, Georgia, September 2017.
The author was also involved in a set of research papers which are not directly re-
lated to this thesis [19, 20, 55], and would like to thank the collaborators for producing
great work in this area.
129
130
Bibliography
[1] Minotaur Analysis - Malware Repository. minotauranalysis.com.
[2] VX Vault - Online Repository of Malware Samples. vxvault.siri-urz.net.
[3] Malware Tips - Your Security Advisor. http://malwaretips.com/forums/
virus-exchange.104/.
[4] MalwareBlackList - Online Repository of Malicious URLs. http://www.
malwareblacklist.com.
[5] Proof-of-concept Automated Baremetal Malware Analysis Framework. https://code.google.com/p/nvmtrace/.
[6] Police ransomware threat assessment. Europol Public Information, 2014.
[7] BitBlaze Malware Analysis Service. http://bitblaze.cs.berkeley.edu/,2016.
[8] AutoIt. https://www.autoitscript.com/site/autoit/, 2016.
[9] IOzone Filesystem Benchmark. www.iozone.org, 2016.
[10] SilentCrypt: A new ransomware family. https://www.youtube.com/watch?
v=qiASKA4BMck, 2016.
[11] Anand Ajjan. Ransomware: Next-Generation Fake Antivirus.http://www.sophos.com/en-us/medialibrary/PDFs/technicalpapers/
SophosRansomwareFakeAntivirus.pdf, 2013.
[12] Alex Hern. Major sites including New York Times and BBC hit by ransomwaremalvertising. https://www.theguardian.com/technology/2016/mar/16/
major-sites-new-york-times-bbc-ransomware-malvertising, 2016.
[13] Alex Hern. Ransomware threat on the rise as almost 40 percent of bussi-nesses attacked. https://www.theguardian.com/technology/2016/aug/03/
ransomware-threat-on-the-rise-as-40-of-businesses-attacked, 2016.
[14] Amin Kharraz. A brief demo on how Redemption operates. https://www.
youtube.com/watch?v=iuEgFVz7a7g, 2016.
131
[15] Andrew Dalton. Hospital paid 17K ransom to hackers of its computer network.http://bigstory.ap.org/article/d89e63ffea8b46d98583bfe06cf2c5af/
hospital-paid-17k-ransom-hackers-its-computer-network, 2016.
[16] Manos Antonakakis, Roberto Perdisci, David Dagon, Wenke Lee, and NickFeamster. Building a dynamic reputation system for dns. In Proceedings of the19th USENIX conference on Security, pages 18�18. USENIX Association, 2010.
[17] Manos Antonakakis, Roberto Perdisci, Wenke Lee, Nikolaos Vasiloglou, II, andDavid Dagon. Detecting malware domains at the upper dns hierarchy. InProceedings of the 20th USENIX Conference on Security, SEC'11, pages 27�27,Berkeley, CA, USA, 2011. USENIX Association.
[18] Manos Antonakakis, Roberto Perdisci, Yacin Nadji, Nikolaos Vasiloglou, SaeedAbu-Nimeh, Wenke Lee, and David Dagon. From throw-away tra�c to bots:Detecting the rise of dga-based malware. In Presented as part of the 21stUSENIX Security Symposium (USENIX Security 12), pages 491�506, Bellevue,WA, 2012. USENIX.
[19] Sajjad Arshad, Amin Kharraz, and William Robertson. Identifying extension-based ad injection via �ne-grained web content provenance. In Proceedingsof the 19th International Symposium on Research in Attacks, Intrusions andDefenses (RAID), 9 2016.
[20] Sajjad Arshad, Amin Kharraz, and William Robertson. Include me out: In-browser detection of malicious third-party content inclusions. In Proceedings ofthe 20th International Conference on Financial Cryptography and Data Security(FC), 2 2016.
[21] Ulrich Bayer, Christopher Kruegel, and Engin Kirda. TTAnalyze: A Tool forAnalyzing Malware. In Proceedings of the European Institute for ComputerAntivirus Research Annual Conference, April 2006.
[22] BBC News. University pays 20,000 Dollars to ransomware hackers. http:
//www.bbc.com/news/technology-36478650, 2016.
[23] Blockchain.info. Bitcoin Block Explorer. https://blockchain.info.
[24] Brian M Bowen, Shlomo Hershkop, Angelos D Keromytis, and Salvatore JStolfo. Baiting inside attackers using decoy documents. Springer, 2009.
[25] Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. Detecting self-mutating malware using control-�ow graph matching. In Detection of Intrusionsand Malware & Vulnerability Assessment, pages 129�143. Springer, 2006.
[26] Brian Carrier. File System Forensic Analysis. Addison-Wesley Professional,2005.
132
[27] Catalin Cimpanu. Breaking Bad Ransomware Completely Unde-tected by VirusTotal. http://http://news.softpedia.com/news/
breaking-bad-ransomware-goes-completely-undetected-by-virustotal-493265.
shtml, 2015.
[28] Charlie Osborne. Researchers launch another salvo atCryptXXX ransomware. http://www.zdnet.com/article/
researchers-launch-another-salvo-at-cryptxxx-ransomware/, 2016.
[29] Chris Francescani. Ransomware Hackers Blackmail U.S.Police Departments. http://www.cnbc.com/2016/04/26/
ransomware-hackers-blackmail-us-police-departments.html, 2016.
[30] Nicolas Christin. Traveling the silk road: A measurement analysis of a largeanonymous online marketplace. In Proceedings of WWW 2013, May 2013.
[31] Mihai Christodorescu, Somesh Jha, and Christopher Kruegel. Mining speci�ca-tions of malicious behavior. In Proceedings of the 1st India software engineeringconference, pages 5�14. ACM, 2008.
[32] Mihai Christodorescu, Somesh Jha, Sanjit A Seshia, Dawn Song, and Randal EBryant. Semantics-aware malware detection. In Security and Privacy, 2005IEEE Symposium on, pages 32�46. IEEE, 2005.
[33] Cisco, Inc. Ransomware on Steroids: Cryptowall 2.0. http://blogs.cisco.
com/security/talos/cryptowall-2, 2015.
[34] Connor Mannion. Three U.S. Hospitals Hit in String of Ran-somware Attacks. http://www.nbcnews.com/tech/security/
three-u-s-hospitals-hit-string-ransomware-attacks-n544366, 2016.
[35] Andrea Continella, Alessandro Guagnelli, Giovanni Zingaro, GiulioDe Pasquale, Alessandro Barenghi, Stefano Zanero, and Federico Maggi.Shieldfs: a self-healing, ransomware-aware �lesystem. In Proceedings of the32nd Annual Conference on Computer Security Applications, pages 336�347.ACM, 2016.
[36] Marco Cova, Corrado Leita, Olivier Thonnard, Angelos D. Keromytis, and MarcDacier. An Analysis of Rogue AV Campaigns. In Proceedings of the Interna-tional Conference on Recent Advances in Intrusion Detection, pages 442�463,2010.
[37] Cuckoo Foundation. Cuckoo Sandbox: Automated Malware Analysis. www.
cuckoosandbox.org, 2015.
[38] Dan Whitcomb. California lawmakers take step towardoutlawing ransomware. http://www.reuters.com/article/
us-california-ransomware-idUSKCN0X92PA, 2016.
133
[39] Dell SecureWorks. Cryptolocker Ransomware. http://www.secureworks.com/cyber-threat-intelligence/threats/cryptolocker-ransomware/, 2014.
[40] Dell SecureWorks. University of Calgary paid 20K in ran-somware attack. http://www.cbc.ca/news/canada/calgary/
university-calgary-ransomware-cyberattack-1.3620979, 2016.
[41] Brian Donohue. Reveton Ransomware Adds Pass-word Purloining Function. http://threatpost.com/
reveton-ransomeware-adds-password-purloining-\\function/100712,2013.
[42] Reid Fergal and Harrigan Martin. An analysis of anonymity in the bitcoinsystem. In Security and Privacy in Social Networks, 2012.
[43] Alexandre Gazet. Comparative analysis of various ransomware virii. Journalin Computer Virology, 6(1):77�90, February 2010.
[44] Grefgory Wolf. 8 High Pro�le Ransomware Attacks YouMay Not Have Heard Of. https://www.linkedin.com/pulse/
8-high-profile-ransomware-attacks-you-may-have-heard-gregory-wolf,2016.
[45] Chris Grier, Lucas Ballard, Juan Caballero, Neha Chachra, Christian J Dietrich,Kirill Levchenko, Panayiotis Mavrommatis, Damon McCoy, Antonio Nappa,Andreas Pitsillidis, et al. Manufacturing compromise: the emergence of exploit-as-a-service. In Proceedings of the 2012 ACM conference on Computer andcommunications security, pages 821�832, 2012.
[46] Greg Hoglund and Jamie Butler. Rootkits: Subverting the Windows Kernel.Addison-Wesley Professional, 2005.
[47] International Secure System Lab. Anubis - Malware Analysis for UnknownBinaries. https://anubis.iseclab.org/, 2015.
[48] Jashua Tully. An Anti-Reverse Engineering Guide. http://www.codeproject.com/Articles/30815/An-Anti-Reverse-Engineering-Guide#StolenBytes,2008.
[49] Jerry Zremski. New York Senator Seeks to Com-bat Ransomware. http://www.govtech.com/security/
New-York-Senator-Seeks-to-Combat-Ransomware.html, 2016.
[50] Ari Juels and Ronald L Rivest. Honeywords: Making password-cracking de-tectable. In Proceedings of the 2013 ACM SIGSAC conference on Computer &communications security, pages 145�160. ACM, 2013.
134
[51] Yuhei Kawakoya, Makoto Iwamura, Eitaro Shioji, and Takeo Hariu. Api chaser:Anti-analysis resistant malware analyzer. In Research in Attacks, Intrusions,and Defenses, pages 123�143. Springer, 2013.
[52] Kevin Savage, Peter Coogan, Hon Lau. the Evolution of Ransomware.http://www.symantec.com/content/en/us/enterprise/media/security_
response/whitepapers/the-evolution-of-ransomware.pdf, 2015.
[53] Amin Kharraz, Sajjad Arshad, Collin Mulliner, William Robertson, and En-gin Kirda. UNVEIL: A Large-Scale, Automated Approach to Detecting Ran-somware. In 25th USENIX Security Symposium, 2016.
[54] Amin Kharraz and Engin Kirda. Redemption: Real-time protection againstransomware at end-hosts. In Proceedings of the 20th International Symposiumon Research in Attacks, Intrusions and Defenses (RAID), 9 2017.
[55] Amin Kharraz, Engin Kirda, William Robertson, Davide Balzarotti, and Au-relien Francillon. Optical Delusions: A Study of Malicious QR Codes in theWild. In Proceedings of the IEEE/IFIP International Conference on DependableSystems and Networks (DSN), 06 2014.
[56] Amin Kharraz, William Robertson, Davide Balzarotti, Leyla Bilge, and EnginKirda. Cutting the Gordian Knot: A Look Under the Hood of RansomwareAttacks. In Conference on Detection of Intrusions and Malware & VulnerabilityAssessment (DIMVA), 07 2015.
[57] Dhilung Kirat, Giovanni Vigna, and Christopher Kruegel. Barebox: e�cientmalware analysis on bare-metal. In Proceedings of the 27th Annual ComputerSecurity Applications Conference, pages 403�412. ACM, 2011.
[58] Dhilung Kirat, Giovanni Vigna, and Christopher Kruegel. Barecloud: Bare-metal analysis-based evasive malware detection. In 23rd USENIX SecuritySymposium (USENIX Security 14), pages 287�301. USENIX Association, 2014.
[59] Engin Kirda, Christopher Kruegel, Greg Banks, Giovanni Vigna, and RichardKemmerer. Behavior-based spyware detection. In Usenix Security, volume 6,2006.
[60] Clemens Kolbitsch, Paolo Milani Comparetti, Christopher Kruegel, EnginKirda, Xiao-yong Zhou, and XiaoFeng Wang. E�ective and e�cient malwaredetection at the end host. In USENIX security symposium, pages 351�366,2009.
[61] Clemens Kolbitsch, Engin Kirda, and Christopher Kruegel. The power of pro-crastination: detection and mitigation of execution-stalling malicious code. InProceedings of the 18th ACM conference on Computer and communications se-curity, pages 285�296. ACM, 2011.
135
[62] Eugene Kolodenker, William Koch, Gianluca Stringhini, and Manuel Egele.Paybreak: Defense against cryptographic ransomware. In Proceedings of the2017 ACM on Asia Conference on Computer and Communications Security,ASIA CCS '17, pages 599�611, New York, NY, USA, 2017. ACM.
[63] Brian Krebs. Inside a Reveton Ransomware Op-eration. http://krebsonsecurity.com/2012/08/
inside-a-reveton-ransomware-operation/, 2012.
[64] Brian Krebs. FBI: North Korea to Blame forSony Hack. http://krebsonsecurity.com/2014/12/
fbi-north-korea-to-blame-for-sony-hack/, 2014.
[65] Christopher Kruegel, Engin Kirda, Darren Mutz, William Robertson, and Gio-vanni Vigna. Polymorphic worm detection using structural information of exe-cutables. In Recent Advances in Intrusion Detection, pages 207�226. Springer,2006.
[66] Andrea Lanzi, Davide Balzarotti, Christopher Kruegel, Mihai Christodorescu,and Engin Kirda. Accessminer: Using system-centric models for malware pro-tection. In Proceedings of the 17th ACM Conference on Computer and Com-munications Security, CCS '10, pages 399�412. ACM, 2010.
[67] Lawrance Abrams. TeslaCrypt Decrypted: Flaw in TeslaCrypt allows Victim'sto Recover their Files. http://www.bleepingcomputer.com/news/security/teslacrypt-decrypted-flaw-in-teslacrypt-allows-victims-to-recover\
\-their-files/, 2016.
[68] Wei-Jen Li, Ke Wang, Salvatore J Stolfo, and Benjamin Herzog. Fileprints:Identifying �le types by n-gram analysis. In Information Assurance Workshop,2005. IAW'05. Proceedings from the Sixth Annual IEEE SMC, pages 64�71.IEEE, 2005.
[69] Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Trans-actions on Information theory, 37:145�151, 1991.
[70] Martina Lindorfer, Clemens Kolbitsch, and Paolo Milani Comparetti. Detectingenvironment-sensitive malware. In Recent Advances in Intrusion Detection,pages 338�357. Springer, 2011.
[71] Malware Don't Need Co�ee. Guess who's back again ? Cryp-towall 3.0. http://malware.dontneedcoffee.com/2015/01/
guess-whos-back-again-cryptowall-30.html, 2015.
[72] Lorenzo Martignoni, Elizabeth Stinson, Matt Fredrikson, Somesh Jha, andJohn C Mitchell. A layered architecture for detecting malicious behaviors. InRecent Advances in Intrusion Detection, pages 78�97. Springer, 2008.
136
[73] McAfee Labs. McAfee Labs 2017 Threat Predictions Report. https://www.
mcafee.com/us/resources/reports/rp-threats-predictions-2017.pdf,2017.
[74] Sarah Meiklejohn, Marjori Pomarole, Grant Jordan, Kirill Levchenko, DamonMcCoy, Geo�rey M. Voelker, and Stefan Savage. A �stful of bitcoins: Charac-terizing payments among men with no names. In Proceedings of the 2013 Con-ference on Internet Measurement Conference, IMC '13, pages 127�140, 2013.
[75] Michael Mimoso. Leaked NSA Exploit SpreadingRansomware WorldWide. https://threatpost.com/
leaked-nsa-exploit-spreading-ransomware-worldwide/125654/, 2017.
[76] Microsoft, Inc. Blocking Direct Write Operations to Volumes andDisks. https://msdn.microsoft.com/en-us/library/windows/hardware/
ff551353(v=vs.85).aspx.
[77] Microsoft, Inc. Microsoft Security Intelegence Report Vol. 16. http://www.
microsoft.com/security/sir/default.aspx, 2013.
[78] Microsoft, Inc. File System Mini�lter Drivers. https://msdn.microsoft.com/en-us/library/windows/hardware/ff540402%28v=vs.85%29.aspx, 2014.
[79] Microsoft, Inc. Protecting Anti-Malware Services. https://msdn.microsoft.com/en-us/library/windows/desktop/dn313124(v=vs.85).aspx, 2016.
[80] Malte Möser. Anonymity of bitcoin transactions: An analysis of mixing services.In Proceedings of Monster Bitcoin Conference, 2013.
[81] Ms. Smith. Kansas Heart Hospital hit with ransomware; attackers demandtwo ransoms. http://www.networkworld.com/article/3073495/security/
kansas-heart-hospital-hit-with-ransomware-paid-but-attackers-\
\demanded-2nd-ransom.html, 2016.
[82] Terry Nelms, Roberto Perdisci, and Mustaque Ahamad. Execscent: Mining fornew c&c domains in live networks with adaptive control protocol templates. InUSENIX Security, pages 589�604, 2013.
[83] Nick Nikiforakis, Marco Balduzzi, S. Van Acker, W. Joosen, and DavideBalzarotti. Exposing the lack of privacy in �le hosting services. In Proceed-ings of the 4th USENIX conference on Large-scale exploits and emergent threats(LEET), LEET 11. USENIX Association, March 2011.
[84] No-More-Ransomware Project. No More Ransomware! https://www.
nomoreransom.org/about-the-project.html, 2016.
[85] Patrick Traynor Nolen Scaife, Henry Carter and Kevin Butler. CryptoLock(and Drop It): Stopping Ransomware Attacks on User Data. In In IEEE In-ternational Conference on Distributed Computing Systems (ICDCS), 2016.
137
[86] Gavin O'Gorman and Geo� McDonald. Ransomware: A Growing Menance.http://www.symantec.com/connect/blogs/ransomware-growing-menace,2012.
[87] Payload Security Inc,. Payload Security. https://www.hybrid-analysis.com,2016.
[88] Brian Prince. CryptoLocker Could Herald Rise of MoreSophisticated Ransomware. http://www.darkreading.com/
attacks-breaches/cryptolocker-could-herald-rise-of-more-sophis\
\ticated-ransomware, 2013.
[89] QuickBT. Disturbing Bitcoin Virus. http://www.reddit.com/r/Bitcoin/
comments/1o53hl/, October 2013.
[90] Babak Rahbarinia, Roberto Perdisci, and Manos Antonakakis. Segugio: E�-cient behavior-based tracking of malware-control domains in large isp networks.In DSN, pages 403�414. IEEE, 2015.
[91] Ray Smith. Tesseract Open Source OCR Engine . https://github.com/
tesseract-ocr/tesseract, 2015.
[92] REAQTA Inc,. HyraCrypt Ransomware.https://reaqta.com/2016/02/hydracrypt-ransomware/, 2016.
[93] Dorit Ron and Adi Shamir. Quantitative analysis of the full bitcoin transactiongraph. In Financial Cryptography and Data Security 2013, volume 7859, pages6�24, 2013.
[94] Christian Rossow, Christian J Dietrich, Chris Grier, Christian Kreibich, VernPaxson, Norbert Pohlmann, Herbert Bos, and Maarten Van Steen. Prudentpractices for designing malware experiments: Status quo and outlook. In Secu-rity and Privacy (SP), 2012 IEEE Symposium on, pages 65�79. IEEE, 2012.
[95] Matthew G Schultz, Eleazar Eskin, Erez Zadok, and Salvatore J Stolfo. Datamining methods for detection of new malicious executables. In Security andPrivacy, 2001. S&P 2001. Proceedings. 2001 IEEE Symposium on, pages 38�49. IEEE, 2001.
[96] Sophos, Inc. Security Threat Report 2014, Smarter, Shadier, Stealth-ier Malware. http://www.sophos.com/en-us/medialibrary/PDFs/other/
sophos-security-threat-report-2014.pdf, 2014.
[97] Michele Spagnuolo, Federico Maggi, and Stefano Zanero. BitIodine: Extractingintelligence from the bitcoin network. In Financial Cryptography and DataSecurity, Lecture Notes in Computer Science (LNCS). Springer-Verlag, March2014.
138
[98] Elizabeth Stinson and John C Mitchell. Characterizing bots remote control be-havior. In Detection of Intrusions and Malware, and Vulnerability Assessment,pages 89�108. Springer, 2007.
[99] Brett Stone-Gross, Ryan Abman, Richard A. Kemmerer, Christopher Kruegel,Douglas G. Steigerwald, and Giovanni Vigna. The Underground Economy ofFake Antivirus Software. In Proceedings of the Workshop on the Economics ofInformation Security and Privacy, 2013.
[100] Andrew H Sung, Jianyun Xu, Patrick Chavez, and Srinivas Mukkamala. Staticanalyzer of vicious executables (save). In Computer Security Applications Con-ference, 2004. 20th Annual, pages 326�334. IEEE, 2004.
[101] Symantec, Inc. Internet Security Threat Report. http://www.symantec.com/security_response/publications/threatreport.jsp, 2017.
[102] The Cyber Threat Alliance. Lucrative Ransomware Attacks: Analysis of Cryp-towall Version 3 Threat.
[103] TrendLabs. An Onslaught of Online Banking Mal-ware and Ransomware. http://apac.trendmicro.com/
cloud-content/apac/pdfs/security-intelligence/reports/
rpt-cashing-in-on-digital-information.pdf, 2013.
[104] Amit Vasudevan and Ramesh Yerraballi. Cobra: Fine-grained malware anal-ysis using stealth localized-executions. In Security and Privacy, 2006 IEEESymposium on, 2006.
[105] Giovanni Vigna. From Anubis and Wepawet to Llama. http://info.
lastline.com/blog/from-anubis-and-wepawet-to-llama, June 2014.
[106] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Imagequality assessment: from error visibility to structural similarity. Image Process-ing, IEEE Transactions on, 13(4):600�612, 2004.
[107] WIRED Magazine. Why Hospitals Are the Perfect Tar-gets for Ransomware. https://www.wired.com/2016/03/
ransomware-why-hospitals-are-the-perfect-targets/, 2016.
[108] J-Y Xu, Andrew H Sung, Patrick Chavez, and Srinivas Mukkamala. Polymor-phic malicious executable scanner by api sequence analysis. In Hybrid IntelligentSystems, 2004. HIS'04. Fourth International Conference on, pages 378�383.IEEE, 2004.
[109] Heng Yin, Dawn Song, Manuel Egele, Christopher Kruegel, and Engin Kirda.Panorama: capturing system-wide information �ow for malware detection andanalysis. In Proceedings of the 14th ACM conference on Computer and commu-nications security, pages 116�127. ACM, 2007.
139
[110] Adam Young and Moti Yung. Cryptovirology: Extortion-based security threatsand countermeasures. In Security and Privacy, 1996. Proceedings., 1996 IEEESymposium on, pages 129�140. IEEE, 1996.
[111] Adam L. Young. Building a Cryptovirus Using Microsoft's Cryptographic API.In Proceedings of the International Conference on Information Security, pages389�401, 2005.
[112] Jim Yuill, Mike Zappe, Dorothy Denning, and Fred Feer. Honey�les: decep-tive �les for intrusion detection. In Information Assurance Workshop, 2004.Proceedings from the Fifth Annual IEEE SMC, pages 116�122. IEEE, 2004.
140