techniques and solutions for - semantic scholar · techniques and solutions for addressing...

Techniques and Solutions forAddressing Ransomware Attacks

A dissertation presented in partial ful�llment of therequirements for the degree of

Doctor of Philosophy

in the �eld of

Information Assuranceby

Amin Kharraz

College of Computer and Information Science

Northeastern University

Ph.D. Committee

Engin Kirda Advisor, Northeastern University

William Robertson Advisor, Northeastern University

Long Lu Internal Member, Northeastern University

Manuel Egele External Member, Boston University

December 2017

Abstract

Ransomware is a form of extortion-based attack that locks the victim's digital

resources and requests money to release them. Although the concept of ransomware

is not new (i.e., such attacks date back at least as far as the 1980s), this type of

malware has recently experienced a resurgence in popularity. In fact, over the last

few years, a number of high-pro�le ransomware attacks were reported. Very recently,

WannaCry ransomware infected thousands of vulnerable machines around the world,

and substantially disrupted critical services such as British healthcare system. Given

the size and variety of threats we are facing today, having solutions to e�ectively

detect and analyze unknown ransomware attacks seems necessary.

In this thesis, we argue that it is possible to develop novel defense mechanisms,

and protect user data from a large number of ransomware attacks with zero data

loss. We show that such an approach is both feasible and e�ective. To support this

claim, we investigate how a successful ransomware attack interacts with the operating

system resources. In the �rst part of this thesis, we perform an evolutionary-based

analysis to understand the destructive behavior of these attacks. We show that by

monitoring �lesystem activity, it is possible to design practical defense systems that

could stop a large number of ransomware attacks, even those using sophisticated

encryption capabilities. In the second part, we propose a novel dynamic analysis

system, called Unveil, that is designed to analyze ransomware attacks, and model

their interactions with the �lesystem. In the third and the last part, we propose an

end-point framework, called Redemption, to protect user data from ransomware

attacks. We demonstrate that our proposed solution can be retro�tted into existing

operating systems, and achieve zero data loss against current ransomware families

without introducing alarm fatigue.

5

Acknowledgment

Ph.D. career is a long way. It requires patience and perseverance. During the

Ph.D. career, you may de�ne uninteresting projects, fail several time, or even ask

wrong questions, but it is �ne since the graduate school is the best place to fail and

learn from your failures. Eventually, at some point, you learn what problems are

interesting, how de�ne unde�ned problems, and why community should care about

these problems. In fact, these are parts of the valuable skillset that one earns during

his/her Ph.D. career.

Getting a Ph.D. is not possible without having great people around. I would like to

particularly thank Prof. Engin Kirda, my great advisor, who was always supporting

me from the very beginning. I also would like to thank my co-advisor � Prof. Wil

Robertson. Wil helped me a lot in developing great ideas. It was an honor to be part

of Boston SecLab and work with him.

I thank my committee members Prof. Long Lu and Prof. Manuel Egele for reading

this thesis, and providing valuable feedback.

I thank all the great teachers I had in Bushehr � my hometown. Mr. Rashidpour,

my math teacher, was a great mentor, and was one of the people that encouraged me

to become an engineer.

I also thank the most recent cadre of students, coworkers, and friends who made

the grad school so much fun including; Andrea, Michael, Kaan, Sajjad, Saman, Reza,

Ahmet, Talha, Tobi, Sevtap, and Amirali.

I thank my parents and my wonderful in-laws for their love and support. Lastly,

I thank my wonderful wife, Mahboubeh, for bearing with me through di�cult times

and making the life much easier. Mahboubeh supported me on every decision I made

6

in this journey, helped me to resume life after each paper rejection and resurrect

dreams after failures. Thank you my sweetheart!

7

Contents

1 Introduction 17

1.1 Motivation for Novel Defenses . . . . . . . . . . . . . . . . . . . . . . 17

1.2 Limitations of Current Defense Mechanisms . . . . . . . . . . . . . . 19

1.3 Overview of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 20

2 An Analysis on Current Ransomware Attacks 23

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2 Ransomware Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 30

2.3 Characterization and Evolution . . . . . . . . . . . . . . . . . . . . . 30

2.3.1 File System Activity . . . . . . . . . . . . . . . . . . . . . . . 31

2.3.2 Mitigation Strategies . . . . . . . . . . . . . . . . . . . . . . . 41

2.4 Financial Incentives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.4.1 Bitcoin as a Charging Method . . . . . . . . . . . . . . . . . . 45

2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3 A Dynamic Analysis Approach to Detecting Ransomware 55

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

9

3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.3 Unveil Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.3.1 Detecting File Lockers . . . . . . . . . . . . . . . . . . . . . . 61

3.3.2 Detecting Screen Lockers . . . . . . . . . . . . . . . . . . . . . 66

3.4 Unveil Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.4.1 Generating User Environments . . . . . . . . . . . . . . . . . 69

3.4.2 Filesystem Activity Monitor . . . . . . . . . . . . . . . . . . . 72

3.4.3 Desktop Lock Monitor . . . . . . . . . . . . . . . . . . . . . . 73

3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 74

3.5.2 Ground Truth (Labeled) Dataset . . . . . . . . . . . . . . . . 75

3.5.3 Detecting Zero-Day Ransomware . . . . . . . . . . . . . . . . 80

3.5.4 Case Study: Automated Detection of a New Ransomware Family 86

3.6 Discussion and Limitations . . . . . . . . . . . . . . . . . . . . . . . . 88

3.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4 Protecting End-Points from Ransomware Attacks 93

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.3 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.4 Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.5 Detection Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.5.1 Content-based Features . . . . . . . . . . . . . . . . . . . . . . 103

4.5.2 Behavior-based Features . . . . . . . . . . . . . . . . . . . . . 104

10

4.5.3 Evaluating the Feature Set . . . . . . . . . . . . . . . . . . . . 105

4.5.4 Malice Score Calculation (MSC) Function . . . . . . . . . . . 106

4.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

4.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

4.7.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

4.7.2 Detection Results . . . . . . . . . . . . . . . . . . . . . . . . . 113

4.7.3 Disk I/O and File System Benchmarks . . . . . . . . . . . . . 115

4.7.4 Real-world Application Testing . . . . . . . . . . . . . . . . . 119

4.7.5 Usability Experiments . . . . . . . . . . . . . . . . . . . . . . 121

4.8 Discussion and Limitations . . . . . . . . . . . . . . . . . . . . . . . . 123

4.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5 Conclusions 127

11

List of Figures

2-1 The malicious process attempts to get the �le map on the physical disk

to overwrite the �le's data after the encryption. . . . . . . . . . . . . 35

2-2 Disk layout for �les with di�erent sizes in NTFS �le system. . . . . . 37

2-3 A ransomware attack (Gpcode) with a simple delete operation. . . . . 39

2-4 The amount of ransom money among common ransomware families. 46

2-5 The number of Bitcoins per address. . . . . . . . . . . . . . . . . . . 47

2-6 The total number of transactions per Bitcoin address. . . . . . . . . . 48

2-7 The duration of activity for Bitcoin addresses. . . . . . . . . . . . . . 49

2-8 The duration of activity for Bitcoin addresses. . . . . . . . . . . . . . 50

3-1 Overview of the design of I/O access monitor in Unveil. . . . . . . . . . 64

3-2 Di�erent attack strategies among ransomware families with respect to I/O

access patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3-3 Precision-recall analysis of the tool. . . . . . . . . . . . . . . . . . . . . 79

3-4 Evolution of VT scanner reports after six submissions. . . . . . . . . . . . 85

3-5 I/O activities of a previously unknown ransomware family detected by Unveil. 86

13

4-1 Redemption mediates the access to the �le system and redirects each write

request on the user �les to a protected area without changing the status of

the original �le. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4-2 The steps involved in commiting the benign changes to the �les. . . . . . . 102

4-3 TP/FP analysis of Redemption based on the best threshold value. . . . . 114

14

List of Tables

2.1 The list of malware families used in our experiments. . . . . . . . . . 29

2.2 The IRP requests generated during Cryptowall attack. . . . . . . . . 33

2.3 The types of IRPs requested by a malicious process to encrypt and

overwrite the victim's �les during a ransomware attack. . . . . . . . . 34

2.4 A set of IRP requests generated on behalf of a malicious process to

delete �les during an attack. . . . . . . . . . . . . . . . . . . . . . . . 36

2.5 Summary of types of charges in 15 ransomware families. . . . . . . . . 45

3.1 The list of ransomware families used in the �rst experiment. . . . . . . . . 76

3.2 The list of benign applications that generate similar I/O access patterns

to ransomware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.3 An example of I/O access in Unveil for CryptoWall 3.0 and CryptoWall 4.0. 77

3.4 I/O accesses for deletion and compression mechanisms in benign/malicious

applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3.5 Unveil detection results. . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.1 A list of Benign application and their malice scores. . . . . . . . . . . . . 116

4.2 A list of ransomware families and their malice scores. . . . . . . . . . . . 117

15

4.3 The ransomware families used to test Redemption and other proposed

techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

4.4 Disk I/O performance in a standard and a Redemption-protected host. 120

4.5 Runtime overhead of Redemption on a set of end-point applications . . . 120

16

Chapter 1

Introduction

1.1 Motivation for Novel Defenses

Malware attacks continue to remain one of the most popular attack vectors in the

wild [101, 73]. Among all classes of malware, ransomware has recently become very

popular among malware authors [12, 29, 34, 44]. Ransomware is a kind of scareware

that locks the victims' computers until they make a payment to re-gain access to their

data. In fact, this class of malware is not a new concept (such attacks have been in

the wild since the last decade), but the growing number of high-pro�le ransomware

attacks has resulted in increasing concerns on how to defend against this class of

malware.

In 2016, several public and private sectors, including the healthcare industry, were

impacted by ransomware [22, 13, 107]. Lately, US o�cials have also expressed their

concerns about ransomware [38, 49], and even asked the U.S. government to focus

on �ghting ransomware under the Cybersecurity National Action Plan [49]. Very

recently, WannaCry ransomware, the most recent successful ransomware attack, im-

17

pacted thousands of users around the world by exploiting the EternalBlue vulnerabil-

ity, encrypting user data, and demanding bitcoin payments in exchange for unlocking

�les [75].

In response to the increasing number of ransomware attacks, users are often ad-

vised to create backups of their critical data. Certainly, having a reliable data backup

policy minimizes the potential costs of being infected with ransomware, and is an im-

portant part of the IT management process. However, the growing number of paying

victims [15, 81, 40] suggests that technically unsophisticated users � who are the main

target of these attacks � do not follow these recommendations, and easily become a

paying victim of ransomware. Hence, ransomware authors continue to create new at-

tacks and evolve their creations as evidenced by the emergence of more sophisticated

ransomware every day [103, 11, 101, 86].

Unfortunately, many of the recent security reports about ransomware [33, 52,

101, 73] mainly focus on the advancements in ransomware attacks and their levels

of sophistication, rather than providing some insights about e�ective defense tech-

niques that should be adopted against this threat. Furthermore, the current defense

mechanisms to detect, analyze, and defend against ransomware are not very di�erent

from the ones that are used to detect other types of evasive malware. Perhaps, the

main assumption here is that this class of malware employs all possible evasion tech-

niques, similar to other classes of malware, to bypass detection tools, reach end-users,

and successfully launch attacks. While we agree that this is a valid assumption, we

claim that these mechanisms cannot lead to the best defense mechanisms against

ransomware, as evidenced by the increasing number of very successful ransomware

attacks in the wild.

18

1.2 Limitations of Current Defense Mechanisms

Ransomware attacks share some similarities with other types of malware particu-

larly in utilizing evasion techniques, payload distribution and infection mechanisms.

For example, similar to other types of malware attacks, opening email attachments

or clicking on malicious advertisements can increase the risk of being infected with

ransomware. Consequently, some of the current techniques that are used to identify

suspicious links or payloads in email attachments are still useful in detecting malicious

binaries that deliver ransomware.

Furthermore, like other classes of malware, most of current ransomware samples

need to communicate with Command and Control (C&C) servers to receive commands

and eventually run attacks on infected machines (e.g., requesting the encryption key

from the remote server). Consequently, the techniques to analyze DGA-based do-

mains or identify malicious network tra�c [16, 82, 90, 18, 17] can be incorporated

in detecting some types of ransomware attacks. Similarly, some of the static anal-

ysis techniques such as Portable Executable (PE) analysis tools or packer detection

techniques can still provide helpful reports about the corresponding malicious binary.

However, current tools cannot provide useful information about the speci�c behavior

of a given ransomware sample to security analysts, which ultimately results in in-

creasing number of unknown attacks on end-user machines as evidenced by a set of

very successful ransomware attacks.

Unlike most of modern malware attacks, ransomware attacks are not usually de-

signed to be stealthy after the infection phase, as the whole point of the attack is to

notify victims that their machines are infected. Furthermore, the core functionality

of a ransomware sample � i.e., the cryptosystem � which results in data destruction

19

works very similar to a class of benign applications that are often used for archiving

or privacy preserving purposes. In fact, the similarity of ransomware functionality

with a set of benign applications as well as the di�erences with other types of malware

attacks in the attack strategy have made the current automated analysis techniques

less e�ective in detecting such attacks and protecting end-users. Therefore, it is quite

useful to develop tools that can accurately extract ransomware behaviors, in light of

these similarities and deferences, and improve current automated analysis systems to

detect these class of malware more e�ectively.

1.3 Overview of the Dissertation

In this thesis, we investigate the feasibility of developing solutions to detect and an-

alyze ransomware attacks. In fact, the thesis of this dissertation is that, unlike other

malware, the nature of ransomware attacks is not very broad, and protecting against

a large number of ransomware attacks is possible. We argue that ransomware attacks

follow very similar patterns in order to be successful and force victims to pay the

ransom fee. For example, unlike other classes of malware that aims to be stealthy to

collect banking credentials or keystrokes without raising suspicion, ransomware noti-

�es victims that they are infected. Moreover, a successful ransomware usually needs

to prevent user's access to his own data by performing encryption and/or deletion

operations, and repeating these destructive actions during an attack. This thesis aims

to show that if we use these insights in the defense side, and accurately model these

behaviors, we can reliably detect a signi�cant number of ransomware attacks in the

wild.

In the �rst part of this thesis, we perform an evolutionary-based analysis on ran-

20

somware attacks to understand the main characteristics of these attacks [56]. This

work is motivated by our need to study the core functionalities of these attacks from a

�lesystem perspective. To this end, we created a dataset of ransomware samples that

covers the majority of the existing ransomware families which have been observed in

the wild. We design and implement a kernel level module to closely monitor the inter-

action of user mode processes with the �lesystem. Our analysis shows that di�erent

classes of ransomware attacks with multiple levels of sophistication share very similar

characteristics from a �lesystem perspective due to the nature of these attacks.

In the second part of this thesis, we present a novel dynamic analysis system,

called Unveil [53], that is designed to analyze ransomware attacks and model their

behaviors. In our approach, the system automatically creates an arti�cial, realistic ex-

ecution environment and monitors how ransomware interacts with that environment.

We evaluate Unveil using more than 148,000 distinct samples belonging to di�erent

malware families. The evaluation of Unveil shows that our approach was able to

correctly detect 13,637 ransomware samples from multiple ransomware families in a

real-world data feed with zero false positives. Our analysis shows that Unveil can

signi�cantly enhance the current anti-malware solutions with regard to ransomware.

In the third part of the thesis, we investigate the possibility of protecting user data

from ransomware attacks at end-hosts with zero data loss. To this end, we propose

a general framework, called Redemption [54], to augment the operating system

with ransomware protection capabilities. Redemption does not require performing

any signi�cant changes in the semantics of the underlying �lesystem functionality, or

modifying the architecture of the operating systems.

This thesis consists of the following sections:

21

In Section 2, we provide an overview of current ransomware attacks and the techniques

they use. In Section 3. we describe a dynamic analysis system that is speci�cally

designed to detect and analyze ransomware samples. Section 4 describes an end-point

solution to protect the consistent state of user data during a ransomware attack.

Finally, Section 5 concludes the thesis.

22

Chapter 2

An Analysis on Current Ransomware

Attacks

2.1 Introduction

Over the past few years, a class of malware known as scareware has become popular

among cybercriminals. This malware takes advantage of people's fear of revealing

their private information, losing their critical data, or facing irreversible hardware

damage. In particular, this work focuses on ransomware, a particular class of scare-

ware that locks the victims' computers until they make a payment to re-gain access

to their data.

Although the �rst version of ransomware appeared in the wild almost 10 years

ago, the volume of ransomware incidents was not signi�cant until a couple of years

ago. As number of ransomware attacks increased over 500% on 2013 compared to

the previous years, the ransomware threat made the headlines as the most notable

malware trend after targeted attacks in 2013 [101]. For example, the Cryptolocker

23

ransomware alone managed to infect approximately 250 thousand computers around

the world, including an entire police department that needed to pay a ransom to

decrypt their documents [39, 88].

Given the signi�cant growth of ransomware attacks [101], it is very important

to develop a protection technique against this type of malware. However, designing

e�ective defense mechanisms is not practically possible without having an insightful

understanding of these attacks. Currently, many of the recent security reports about

ransomware [33, 39] rely on ad-hoc procedures rather than a scienti�c assessment.

Moreover, these reports mainly focus on the advancements in ransomware attacks

and their levels of sophistication, rather than providing some insights about e�ective

defense techniques that should be adopted against this threat. In this work, we

investigate the key functionalities of ransomware samples such that we can propose

e�ective detection mechanisms leveraging our �ndings.

We created a collection of ransomware samples that were categorized in 15 di�erent

families. Our data set covers the majority of the existing ransomware families that

have been observed in the wild between 2006 and 2014. The data set is created

using multiple sources including manual and automatic crawling of public malware

repositories, and the ransomware samples submitted to Anubis [21] since 2011. The

results of our analysis con�rm the folk wisdom that such attacks have a continuous

increase in the number of families and distinct samples per year [77, 101] and also the

advances in certain aspects of the speci�c functionalities of few ransomware families.

However, our results also reveal that in a signi�cant number of samples, the core parts

of ransomware samples lack the technical complexity to perform successful attacks.

While a small fraction of the samples can really prevent the victims from accessing the

24

resources and cause severe problems, a signi�cant number of samples fail to seriously

take the victims' resources as hostage. More speci�cally, we show that more than

94% of ransomware samples in our data set simply try to lock the victims' computer

desktop and request ransom, or use very similar and super�cial approaches to target

the victims' resources.

We also performed an analysis of the charging methods adopted by di�erent ran-

somware families and also traced the transactions of 1,872 Bitcoin addresses that were

used during the Cryptolocker attack. The analysis of the transactions shows that

cybercriminals started to adopt evasive techniques (e.g., using new addresses for each

infection to keep the balances low) in order to better conceal the criminal activity

of the Bitcoin accounts. Our analysis also con�rms that the Bitcoin addresses used

for malicious intents share similar transaction records (e.g., short activity period,

small Bitcoin amounts, small number of transactions). However, determining mali-

cious addresses in the Bitcoin network based on the transaction history is signi�cantly

di�cult, in particular when cybercriminals use multiple independent addresses with

small amount of Bitcoins.

In addition to our long-term study, we also evaluate the feasibility of implementing

defense mechanisms against destructive ransomware attacks. We provide an analy-

sis of the �le system activity of ransomware samples that target users' �les. Our

analysis shows that di�erent classes of ransomware attacks with multiple levels of

sophistication share very similar characteristics from a �le system perspective, due

to the nature of these attacks. Our analysis suggests that when an infected system

is under attack, one can notice a signi�cant change in the �le system activity since

the malicious process generates a large number of similar �le system access requests.

25

Consequently, if we e�ectively monitor the �le system activity (e.g., the changes in

Master File Table (MFT) and the types of I/O Request Packets (IRP) to the �le

system), it is possible to detect multiple di�erent types of destructive ransomware

attacks that target users' �les. This contradicts recent discussions in the security

community about the impossibility of detecting or stopping these types of attacks

due to the use of sophisticated destructive techniques [6, 77, 96, 101]. Based on our

analysis, we conclude that detecting and stopping a large number of destructive ran-

somware attacks is not as complex as it has been reported and deploying practical

defense mechanisms against these attacks is possible due to the engineering of NTFS

�le system.

In summary, the contributions of this work are as follows:

� We analyzed 1,359 ransomware samples, describing previously undocumented

aspects of ransomware attacks with the focus on distinctive and common be-

haviors among di�erent families.

� We explain how the core parts of ransomware samples are engineered and how

these �ndings can potentially be used to detect these attacks. Our analysis

shows that the abnormal �le system activity can be accurately monitored in

destructive ransomware attacks with di�erent levels of sophistication.

� We perform an analysis of charging methods adopted among ransomware fami-

lies and also investigate how cybercriminals used cryptocurrency in recent ran-

somware attacks. Our analysis of illicitly-gained Bitcoins suggests that cyber-

criminals adopted multiple evasive techniques to protect their privacy in Bitcoin

network, making the tracing procedure signi�cantly more di�cult.

26

� We suggest avenues that can be used to defend against a large number of de-

structive zero-day ransomware attacks. We propose a general methodology to

detect these attacks without making any assumptions on how they attack the

users' �les.

The rest of the chapter is structured as follows. In Section 2.2, we present our data

set and ransomware families we categorized. In Section 2.3, we present experiments we

conducted and discuss our �ndings. In Section 2.4, we discuss the �nancial incentives

and payment methods. In Section 2.5, we brie�y present related work. Finally, we

conclude the chapter in Section 2.6.

2.2 Ransomware Data Set

Since collecting the malware data set was a critical part of our research, in this

section, we provide some details about our ransomware sample selection procedure.

To achieve a comprehensive ransomware data set, we collected malware samples from

multiple sources. While we obtained 37.9% of our samples from Anubis, 48.38% were

collected by automatically crawling public malware repositories [4, 2, 1]. We captured

the remainder 13.8% by manually browsing through security forums [71, 3].

We collected 3,921 ransomware samples from all those sources. However, after

removing the samples that did not execute properly in our environment and those for

which we were not able to �nd a release date, our data set contained a total of 1,359

active ransomware samples. To obtain accurate labels for these samples, we cross-

checked the malware samples by automatically submitting the list of MD5 hashes to

VirusTotal. To be conservative on our ransomware malware selection, we consider a

27

malware to be ransomware if at least three AV engines recognized it as belonging to

this category.

To obtain the family names, we parsed the naming schemes of the AV vendors that

are commonly used to assign malware labels. In 77% of samples, AV engines followed

the same labeling scheme and our naming policy was mainly based on the popularity

of the family name in the community (e.g., Gpcode, Reveton). The remaining 23% of

the samples were labeled in an inconsistent way among the di�erent antivirus software,

and in this case we simply selected the most common label among the list of the top

39 AV engines. For example, some samples were labeled both as Pornoasset and as

Tobfy by top AV engines, but we labeled these samples as Tobfy due to the perceived

popularity of the label.

To the best of our knowledge, our analysis covers the majority of the existing

ransomware families observed between 2006 to 2014. However, as our data collection

module relies on external sources, we are aware of the possibility of missing some

types of ransomware attacks. Furthermore, in order to conduct balanced experiments

over the ransomware families, and also to avoid biased results due to polymorphic

techniques, we performed our analysis not only based on individual samples, but also

based on the families and distinct variants per family. Table 2.1 shows the total

number of distinct samples per family as well as distinct variants in each family. It

also shows the �rst time they appeared in the wild and the most recent sample in our

data set.

As it can be clearly seen from Table 2.1, there is a rapid emergence of new families

between 2012 and 2014, as well as a signi�cant growth on the number of new samples

in each family. This may be due to a bias on the data set toward more recent

28

Table 2.1: The list of malware families used in our experiments.

Family Family Description Types of AttacksSamples Variants First Seen Most Recent Encypting Files Changing MBR Deleting Files Stealing Info

Reveton 244(17.95%) 14 2012 2014 ✓ ✓Cryptolocker 32 (2.35%) 4 2013 2014 ✓ ✓CryptoWall 11(0.8) 2 2014 2014 ✓Tobfy 122 (8.97%) 12 2010 2014 ✓Seftad 23 (1.69%) 4 2006 2010 ✓Winlock 308(22.66%) 27 2008 2013 ✓Loktrom 4 (0.29%) 2 2012 2013Calelk 9 (0.663%) 2 2009 2010Urausy 523 (38.48%) 16 2009 2014 ✓ ✓Krotten 17 (1.25%) 3 2008 2009BlueScreen 4 (0.29%) 1 2008 2009Kovter 8 (0.58%) 2 2013 2013 ✓Filecoder 9 (0.66%) 3 2012 2014 ✓ ✓GPcode 21 (1.54%) 4 2004 2008 ✓Weelsof 24 (1.76%) 3 2012 2013

No. of Samples 1,359 - - - 73(5.37%) 23(1.69%) 484(35.61%) 44(3.23%)No. of Variants - 99 - - 13(13.13%) 4(4.04%) 29(21.33%) 6(6.06%)

samples, or to the multiplication of samples due to polymorphism in newer families.

(e.g., Winlock, Urausy, and Reveton). The Table also shows the types of ransomware

attacks we observed among each family in our data set (in addition to locking the user

desktop). In particular, we observed that 61.22% of the samples (57 variants) only

targeted the desktop of compromised computers, without touching the documents in

the �le system. More details on the locking procedure are discussed in Section 2.3.1.

Encrypting the victim �les in addition to locking the desktop was observed in 5.37%

of samples in four families (Cryptolocker, CryptoWall, Filecoder, and Gpcode).

We also observed the emergence of other malicious activities, such as changing the

browser setting or performing multiple infections to install other malware, in 3.23%

of the samples. Despite the fact that the number of samples performing additional

malicious activities (e.g., stealing private information) is not alarmingly high, this

phenomenon is now increasing. For example, our analysis shows that information

stealing was �rst seen in Reveton in early 2012, but other families such as Kevtor,

Urausy, and Cryptolocker started to add stealing information capabilities to their

samples after that date [41, 63]. We provide more details on the malicious behaviors

among ransomware families in Section 2.3.

29

2.2.1 Experimental Setup

We performed all malware execution experiments according to common scienti�c

guidelines [94] inside a Cuckoo Sandbox [37] running Windows XP SP3 32bit, with a

controlled access to the Internet via NAT. Network tra�c (e.g., IRC, DNS and HTTP)

were allowed to enable commands and controls (C&C) communication. In order to

control harmful tra�c (e.g., spam) during the execution of the experiments, we redi-

rected this tra�c to a local honeypot. The network bandwidth was also reduced to

mitigate potential DoS attacks.

The environment installed inside the malware analysis system includes typical data

in an user session such as saved credentials, browser history, and other customizations.

We also emulated some basic user activity by running an script in each malware run

(e.g., opening a window, moving the mouse, opening a website). We then executed

each sample in the analysis environment for 45 minutes to capture the execution

traces of the sample. Since current ransomware samples typically start attacking the

user's �les right after the malicious program is executed by the user, we believe that

the 45 minutes threshold is su�cient for most ransomware samples to exhibit their

malicious behavior. After each execution, the entire system is rolled back to a clean

state to prevent any interference across executions.

2.3 Characterization and Evolution

In this section, we describe our �ndings based on the types of malicious activities

detected in ransomware samples during our experiments. We partition the malicious

activities into multiple categories and discuss our �ndings in each of them.

30

2.3.1 File System Activity

One of our �rst goals was to describe how a malicious process interacts with the �le

system when a compromised computer is under a ransomware attack. To answer this

question, we investigate the common characteristics of ransomware attacks from a

�le system perspective regardless of the technical di�erences that these attacks might

have (such as the infection and the key generation techniques). In order to monitor

the �le system activity, multiple approaches could be used. One classic approach is

to hook the SSDT table [46, 66] to monitor interesting function calls. In our analysis,

we developed a mini�lter driver [78] to capture all I/O requests that the I/O manager

generates on behalf of user-mode processes to access the �le system.

To monitor the I/O requests the mini�lter driver registers callback routines to the

�lter manager. In our analysis, we de�ned pre-operation and post-operation callback

routines for all IRP functions in order to precisely record any I/O and transaction

activity on the �les. For each �le system request, we collected the process name, the

process ID, the parent process ID, the pre-operation and post-operation callback time,

the IRP type, the arguments and the result of the operation. Each record is a tuple:

<PName,PID,PPID,PreOpTime,PostOpTime,IRPFlag,Args,Result>

The mini�lter with di�erent callback routines allows us to capture all the the read,

write, and attribute change requests to the �le system at the closest possible level to

the �le system driver. Our mini�lter driver is deployed in a privileged kernel mode

that has access to nearly all objects of the operating system. Furthermore, since

we captured the �le system activity directly from the I/O manager in the kernel,

there was a low chance that cybercriminals could bypass our monitor. When looking

at the execution traces of the malware program in the analysis environment, we

31

observed that the way malicious processes generate requests to access �le system was

signi�cantly di�erent from benign processes. By performing a close examination of

the �le system activity of multiple ransomware samples, we were able to distinguish

multiple attack strategies that ransomware families used while the system was under

the attack. We discuss our �ndings in the following sections.

Encryption Mechanisms

As presented in Table 2.1, 5.37% of the samples among four families employed some

encryption mechanisms during the experiments. Our analysis shows that existing

ransomware samples use both customized and standard cryptosystems during the at-

tacks. The customized cryptosystems are not necessarily more reliable or complicated

than the standard cryptosystems that Windows platforms provide (e.g., CryptoAPI).

Cybercriminals develop their own cryptosystems for multiple reasons. One reason is

probably to decrease the chance of being easily detected by common malware analysis

techniques (e.g., PE header checking, Hooking standard API functions). One of the

key features crypto-style ransomware samples should have is to reliably minimize the

chance of recovering the original data after generating the encrypted �les. Some of

the modern crypto-style ransomware families such as cryptolocker and CryptoWall

make use of standard Windows functions to perform their �le encryption. They sim-

ply call CryptEncrypt with an handle to the encryption key and a pointer to a bu�er

that contains the plaintext to be encrypted. In these families, the plaintext in the

bu�er is directly overwritten with the encrypted data created by this function. As

depicted in Table 2.2, the I/O manager generates IRP_MJ_CREATE on behalf of the

malicious process to open the user's �le. The �le content is read via IRP_MJ_READ

32

Table 2.2: The IRP requests generated during Cryptowall attack.

Process Name Operation Path Result

mal.exe IRP_MJ_CREATE E:\MySubmissions SUCCESS

mal.exe IRP_MJ_DIRECTORY_CONTROL E:\MySubmissions\dimva2015-submission.tex SUCCESS

mal.exe IRP_MJ_CLEANUP E:\MySubmissions\ SUCCESS

mal.exe IRP_MJ_CLOSE E:\MySubmissions\ SUCCESS

mal.exe IRP_MJ_CREATE E:\MySubmissions\dimva2015-submission.tex SUCCESS

mal.exe IRP_MJ_READ E:\MySubmissions\dimva2015-submission.tex SUCCESS

mal.exe IRP_MJ_WRITE E:\MySubmissions\dimva2015-submission.tex SUCCESS


mal.exe IRP_MJ_WRITE E:\MySubmissions\dimva2015-submission.tex SUCCESS

.

.

.


mal.exe IRP_MJ_SET_INFORMATION E:\MySubmissions\dimva2015-submission.tex SUCCESS

mal.exe IRP_MJ_CLOSE E:\MySubmissions\dimva2015-submission.tex SUCCESS

for encryption and is overwritten with the ciphertext bu�er using the IRP_MJ_WRITE

function each time a �le encryption occurs.

We also observed that even if the samples do not use standard cryptosystems,

it is still possible to recognize how they attack users' �les. For instance, a member

of the Filecoder family uses a simple customized approach to encrypt �les. Unlike

Cryptolocker and CryptoWall, the sample �rst generates an encrypted version of a

�le using an AES-256 encryption key and then overwrites the original �le's data with

the encrypted �le. Table 2.3 shows how the malicious process interacts with the �le

system to encrypt an arbitrary �le when the system is under the attack. The types of

IRPs generated when the malicious process operates show how a ransomware sample

targets the victim's �les. For example, the sequence of IRPs shows that the ran-

somware sample �rst queries the given location to �nd the user's �le and creates han-

dles to the original and encrypted �les. The �le's data is read via a IRP_MJ_READ IRP

and the encrypted data bu�er is written to the destination �le via a IRP_MJ_WRITE

IRP. Consequently, IRP_MJ_SET_INFORMATION is used to delete the original �le after

the �le is closed and also to overwrite the original �le with the encrypted �le. The

33

Table 2.3: The types of IRPs requested by a malicious process to encrypt and over-write the victim's �les during a ransomware attack.


mal.exe IRP_MJ_CREATE E:\MySubmissions SUCCESS





mal.exe IRP_MJ_CREATE E:\MySubmissions\dimva2015-submission.tex.crypt SUCCESS



.

.

mal.exe IRP_MJ_WRITE E:\MySubmissions\dimva2015-submission.tex.crypt SUCCESS

.

.

mal.exe IRP_MJ_CLEANUP E:\MySubmissions\dimva2015-submission.tex.crypt SUCCESS

mal.exe IRP_MJ_CLEANUP E:\MySubmissions\dimva2015-submission.tex SUCCESS

.

.





.

.

mal.exe IRP_MJ_CREATE E:\MySubmissions\dimva2015-submission.tex.crypt SUCCESS

mal.exe IRP_MJ_SET_INFORMATION E:\MySubmissions\dimva2015-submission.tex.crypt SUCCESS

mal.exe IRP_MJ_CLOSE E:\MySubmissions\dimva2015-submission.tex.crypt SUCCESS

sequence of IRPs shown in Figure 2.3 is repeated for every �le on the infected system.

Another sample from Filecoder makes use of the Defragmentation API to get

raw access to each �le's data based on the volume sector and the cluster size. The

sample overwrites the �les with custom data patterns based on how the �les are

kept on the disk. For example, if the �le mapping check shows that the �le has

multiple extents, the physical disk o�sets of each extent should be retrieved to be

overwritten with the custom data pattern. If the �le does not have any extents, it

means that the �le is small and is kept as a MFT entry in the MFT table. The malware

uses the DeviceIoControl from kernel32.dll to get the �le map on the physical

disk. Figure 2-1 shows how a malicious process �nds the �le's data and overwrites the

data after the encryption. When NTFS �nds the �le record for the MFT, it obtains the

VCN-to-LCN mapping information in the �le records data attribute. Consequently,

the malicious process can easily retrieve the information and locate the �le's data on

the disk.

34

$MFTMrr

$MFT

Dimva2015.tex

.

.

.

.

.

.

STANDARD INFORMATION

FILE NAME STARTINGVCN

STARTINGLCN

No. ofClusters

0 1742 4

4 1794 4

VCN 0 1 2 3 4 5 6 7

LNC 1742 1743 1744 1745 1794 1795 1796 1797

Infected Process

.

.

File Object

Handle Table

SetFilePointerEx to the beginning of the file

Overwriting the original file’s data with custom data pattern after encryption

Figure 2-1: The malicious process attempts to get the �le map on the physical diskto overwrite the �le's data after the encryption.

Encryption techniques (e.g., key generation and key management) in crypto-style

ransomware families have also evolved signi�cantly. For example, a Gpcode variant

generates a static key during the attack. This key is also used to encrypt all the non-

system �les. Finding the encryption key in this variant is fairly simple and we were

able to retrieve the key by comparing the encrypted �le and the original one. The

most recent Gpcode variant in our data set encrypts the �les using a unique AES-256

encryption key. The encryption key is then encrypted using a 1024-bit RSA public

key. Another change we observed over time is the place where an asymmetric key pair

is generated. For example, in a sample (md5:ffcf2bb69f23c7c234d2f2ee380cdaa4)

created in 2012, the master key is generated locally in the compromised computer

35

Table 2.4: A set of IRP requests generated on behalf of a malicious process to delete�les during an attack.


mal.exe IRP_MJ_DIRECTORY_CONTROL E:\* SUCCESS

mal.exe IRP_MJ_CLEANUP E:\ SUCCESS

mal.exe IRP_MJ_CLOSE E:\ SUCCESS

mal.exe IRP_MJ_CREATE E:\ SUCCESS

mal.exe IRP_MJ_DIRECTORY_CONTROL E:\MySubmissions\* SUCCESS








and can be extracted by looking into the memory. The use of RSA keys with dif-

ferent key length in Cryptolocker was previously reported [88], but at the time of

writing, we observed only samples with 1024-bit RSA public key in our data sets.

The RSA public key is generated remotely on the C&C server once the compro-

mised computer successfully sends a POST request to C&C servers. If the sample

cannot connect to C&C servers, the malicious behavior is not triggered. The sam-

ple md5:04fb36199787f2e3e2135611a38321eb only encrypted users' �les in logical

drives introduced in the system. An evolution in this family is the encryption of

connected drives. The sample (md5:f1e2de2a9135138ef5b15093612dd813) encrypts

all non-system �les including network shares to minimize the possibility of recov-

ering �les without paying the ransom. These ransomware samples simply employ

GetLogicalDrives, GetDriveType or similar functions to �nd network drives.

Deletion Mechanisms

In this part, we speci�cally discuss �le deletion mechanisms that are unique to ran-

somware attacks. 35.6% of samples among �ve common ransomware families do not

perform any encryption mechanisms. Instead, they delete the user's �les if the user

does not pay the ransom. On the other hand, we observed that certain samples

36

Figure 2-2: Disk layout for �les with di�erent sizes in NTFS �le system.

in Gpcode and Filecoder families deleted the original unencrypted �le's data after

the encryption occurred. Consequently, deletion operation is a common task among

multiple ransomware families in our data set. Table 2.4 shows a sequence of IRPs

collected while running a sample from the Filecoder family. The malicious pro-

cess uses the IRP_MJ_DIRECTORY_CONTROL function to list the �les and then requests

to open the �le via a Win32 CreateFile. Any create requests are performed by

IRP_MJ_CREATE function which returns a handle to the �le objects. Finally, the �le

is deleted by IRP_MJ_SET_INFORMATION when the �le is closed. We observed very

37

similar approaches in other families such as Gpcode, Reveton and Urausy in spite of

di�erences in other aspects of the attacks.

In the NTFS �le system, each �le has an entry in the Master File Table (MFT) that

re�ects the changes of the corresponding �le or folder [26]. The core �le's attributes

in each MFT entry can be found in the $STANDARD_INFORMATION attribute, and the

$DATA attribute that contains the content of the corresponding �le. The content of

the $DATA attribute could be resident or non-resident in the MFT entry depending on

the size of a �le. Figure 2-2 shows the disk layout for �les with di�erent sizes in the

NTFS �le system. The status of a �le is determined by both a �ag and a $BITMAP in

an MFT entry. $BITMAP manages the information about allocation status of clusters

within the disk.

When a ransomware attack occurs, the malware lists the non-system �les and

initiates a delete operation for each of them. The MFT entry for each �le is updated

by changing the status �ag value of the �le from 0x01 to 0x00. Furthermore, the

$BITMAP attribute in MFT �le is set to zero for the corresponding �le. For large �les,

since multiple clusters might be allocated, the location of fragmented data is saved in

the runlist in the header of MFT entry. When the �le is deleted, the clusters that

are used to keep the �le's data are set to unallocated in $BITMAP attribute in the MFT

�le. Consequently, when a �le is deleted in a typical ransomware attack, the MFT

entry is updated, but the content of the �le is not deleted immediately. Therefore,

our analysis suggests that we can detect ransomware attacks that target users' �les

based on the changes in the MFT table and also recover the content associated with

the deleted �les due to the engineering of the NTFS �le system. Finally, Figure 2-3

shows the delete operation from a di�erent perspective- when the malicious process

38

Figure 2-3: A ransomware attack (Gpcode) with a simple delete operation.

tries to delete a large �le that is fragmented among multiple clusters.

Changing Master Boot Records One of the ransomware families (Seftad) was

developed to attack the Master Boot Records (MBR) which contains the executable

boot code and the partition table. The MBR is located on the �rst sector of a hard

disk, and it is loaded into memory at boot time when the system transfer control

to the code stored in the MBR. Samples that target the MBR prevent the infected

system from loading the boot code in the active partition by simply replacing it with

a bogus MBR that displays a message asking for a ransom. Defeating this type of

39

ransomware attack is quite simple. For example, in early samples, the unlock code was

hard-coded into the binary and could be acquired by reverse engineering. Following

this procedure, we discovered the unlock code in 18 Seftad samples in our data set.

Locking Procedure

An important step in a successful ransomware attack is to lock the desktop of the

computer under attack. This is typically done by creating a new desktop and making

it persistent. Ransomware samples simply use CreateDesktop to create a fresh desk-

top environment and eliminate unnecessary processes. The new desktop is created via

a DESKTOP_SWITCHDESKTOP access mode that enables the SwitchDesktop function to

activate the new desktop and receive input from the victim. The desktop is assigned

to a thread using the SetThreadDesktop function. A signi�cant number of samples

in our data set (61.22%) use very similar approaches to establish a persistent desktop

lock.

A small number of samples (8 variants) in families like Urausy, Reveton, and

Winlock employed another approach to lock the desktop. In these families, the lock

banner is simply downloaded as a HTML page with corresponding images based

on the victim's geographical location and it is then displayed in full screen in a IE

window with hidden controls. The banner plays a local law enforcement warning

in the language used in the victim's geographical location. The warning typically

says that the operating system is locked due to infringement against certain laws

(e.g., distributing copyrighted materials or visiting child pornography sites) in that

location.

Disabling certain keyboard shortcuts such as toggling (e.g., Windows key + Tab)

40

is automatically done once a new desktop is created because no other applications

are open to toggle through. However, disabling special keys is another part of the

locking procedure. This is done by installing hook procedures that monitor keyboard

input events. The number of disabled keys was di�erent in di�erent ransomware

families. For example, 18 variants in Reveton and Urausy disabled Windows keys

to prevent the victims from entering the start menu and 72 variants among 15 fam-

ilies attempted to disable the Esc Key to prevent the victims from using keyboard

shortcuts (e.g., starting Windows Task Manager) during the attack.

2.3.2 Mitigation Strategies

API Call Monitoring

As discussed in Section 2.3.1, a signi�cant number of ransomware samples use Win-

dows API functions to lock the victim's desktop. Those API calls can be used to

model the application behavior and train a classi�er to detect suspicious sequence of

Windows API calls. This approach is not necessarily novel, but it would allow us

to stop a large number of ransomware attacks that are produced with little techni-

cal e�orts. For example, a sequence of GetThreadDesktop, CreateDesktopW and

SwitchDesktop functions can be converted to a sequence of API calls. Of course, cy-

bercriminals might be able to evade detection using di�erent techniques. For example,

they may use native APIs to directly lock the system under the attack. However, the

implementation of such ransomware samples requires signi�cant work since the native

APIs are not properly documented and may change among di�erent versions, which

can limit the portability of the attack.

41

Monitoring File System Activity

Our analysis also suggests that it is possible to detect ransomware attacks � even the

ones using deletion and encryption capabilities � based on our �ndings in Section 2.3.1.

Our analysis shows that signi�cant changes occur in the �le system activities (e.g., a

large number of similar encryption, deletion requests) when the system is under a

ransomware attacks. By closely monitor the MFT table, one can detect the creation,

encryption or deletion of �les. For example, when the system is under a ransomware

attack, a signi�cant number of status changes occur in a very short period of time in

MFT entries of the deleted �les. For encrypted �les, we notice a large number of MFT

entries with encrypted content in the $DATA attribute of �les that do not share the

same path (e.g., �les within a directory). In our de�nition, a malicious MFT entry is

a MFT entry that is generated or modi�ed in a system under a ransomware attack. A

classi�er can be trained on benign and malicious MFT entries to detect abnormal �le

system activities when the system is under an attack.

In order to distinguish between benign and malicious �le system activity, another

possible approach consists of monitoring all the �le system requests that user-mode

processes generate. A system with protection capabilities can intercept all the re-

quests and discard the suspicious requests before they reach the �le system driver.

Recovering the deleted �les from the ransomware attacks would also be possible.

If the $DATA attribute is resident in the MFT entry, the content of the �le can be

simply copied to another location. For non-resident $DATA attributes, we need to

parse the RunList in the MFT entry and copy the raw data to another location and

perform the recovery. In any case, early detection of the attack is critical in order

to successfully recover the content of deleted �les, since the deallocated clusters can

42

be allocated to new �les and the content of the deleted �le will be overwritten. This

approach can be applied to most of the ransomware samples with either customized

or standard cryptosystems since the �le level activity is a common characteristic of

ransomware samples that target users' �les.

Using Decoy Resources

The attack strategies adopted to encrypt or delete the user �les are very similar

among ransomware families. For example, the malicious process aggressively attacks

all �les (in di�erent paths, and with di�erent extensions) and tries to encrypt and/or

delete them in a very short period of time. Therefore, de�ning a �le system activity

model that re�ects the normal interaction with the �le system is possible. However,

cybercriminals could try to evade detection by launching attacks while mimicking a

normal user behavior. For example, a cybercriminal may avoid aggressively encrypt-

ing all �les and starts by encrypting �les with recent access or modi�cation time.

Approaches like this might not be detected by approaches that monitor the behavior

of the system. However, one technique to detect these attacks could be to install

decoy �les in multiple locations of the disk that are constantly monitored. The use

of decoy resources to detect security breaches and insider attacks was �rst proposed

in [24, 112]. Decoy resources have also been recently used to improve the security

of hashed passwords [50] and to detect illegally obtained data from �le hosting ser-

vices [83].

In our de�nition, monitoring decoy �les can be an additional layer of defense on

the top of �le system activity monitoring to detect ransomware attacks. The decoy

�les should be indexed at multiple places in the user environment and should be

43

generated in a way that is computationally di�cult for an adversary to discern them.

This approach can increase the chance of detecting the malicious process in early

stages of the attacks regardless of the fact that the ransomware sample uses novel

strategies or customized/standard cryptosystems.

2.4 Financial Incentives

Since the ultimate goal of ransomware attacks is to get money from victims, the

payment method is an important aspect of the attacks. Cybercriminals continuously

strive to �nd more reliable charging methods by improving two important proper-

ties: (1) the di�culty of tracing the recipient of the payments, and (2) the ease of

exchanging payments into a preferred currency. Table 2.5 provides a breakdown of

the charging methods used by ransomware families over the past years. Our analysis

suggests that sending SMS to premium numbers is not necessarily used in old types

of ransomware attacks. For example, the charging method in Calelk is still based

on using premium numbers. The premium rate numbers were hard-coded in the ran-

somware sample or were downloaded from the C&C servers in each infection. This

class of ransomware attacks requires the least amount of technical background and

when propagated in a large scale the revenue could be signi�cant.

A large fraction of ransomware samples (88.22%) used prepaid online payment

systems such as Moneypak, Paysafecard, and Ukash cards, since they provide limited

possibilities to trace the money. These services are not tied to any banking authority

and the owner of the money is anonymous. The ransomware business model takes

advantage of these systems since there are no records of the vouchers to trace cyber-

criminals. In a typical scenario, once a ransomware criminal receives the vouchers,

44

Table 2.5: Summary of types of charges in 15 ransomware families.

Families Type of Charge

PremiumNumber

UntraceablePayments

OnlineShopping

BitcoinTransactions

Reveton ✓ ✓Cryptolocker ✓ ✓CryptoWall ✓ ✓Tobfy ✓Seftad ✓Winlock ✓ ✓Loktrom ✓Calelk ✓Urausy ✓ ✓Krotten ✓BlueScreen ✓kovter ✓ ✓Filecoder ✓GPcode ✓Weelsof ✓Number of Samples 132 (9.71%) 1,199 (88.22%) 14(1.03%) 43 (3.16%)Number of Variants 18 (18.2%) 77 (77.78%) 4 (4.05%) 6 (6.06%)

in order to monetize them, he can sell vouchers in underground voucher exchange

forums, ICQ, or other untraceable communication channels for a lower price than the

nominal value of the vouchers. We also found some unconventional methods used for

charging victims. We found two variants of Kevtor family that forced users to buy a

software package which unlocked the compromised computer. Figure 2-4 represents

the amount charged per family based on our data set. The amount of money required

by ransomware owners to unlock the computer changes based on variants and fami-

lies. For examples, 48.43% of samples among top six families demanded between 150

to 250 dollars.

2.4.1 Bitcoin as a Charging Method

Bitcoin provides some unique technical and privacy advantages for miscreants behind

ransomware attacks. Bitcoin transactions are cryptographically signed messages that

45

Reveton Cryptolocker Winlock Weelsof UraustTop Ransomware Families

0

100

200

300

400

500

600Am

ount

of R

anso

m re

ques

ted

per F

amily

Figure 2-4: The amount of ransom money among common ransomware families.

embody a fund transfer from one public key to another and only the corresponding

private key can be used to authorize the fund transfer. Furthermore, Bitcoin keys are

not explicitly tied to real users, although all transactions are public. Consequently,

ransomware owners can protect their anonymity and avoid revealing any information

that might be used for tracing them.

We performed an analysis of the use of Bitcoins in recent ransomware attacks

where victims had to buy Bitcoins in order to access their resources. We acquired the

Bitcoin addresses by searching the web as well as public forums [89] that conducted

discussions on Cryptolocker attacks. Victims typically participated in the discus-

sions by posting information about their infection and the Bitcoin addresses to which

46

they were required to send the ransom. We collected 1,872 Bitcoin addresses during

the experiments. We automatically queried the transactions from publicly accessible

Bitcoin block explorer websites [23] and parsed the results into a database.

1 5 10 15 20 25 30 35 40 45 50 55 60No of Bitcoins

0.0

0.2

0.4

0.6

0.8

1.0

No.o

f Bitc

oins

Per

Add

ress

(CDF

)

Figure 2-5: The number of Bitcoins per address.

The number of Bitcoins collected by cybercriminals during Cryptolocker attack

is previously reported [97]. Our main focus in this part is to provide insights into

how cybercriminals employed Bitcoin to collect the ransom fee based on the transac-

tions history. One of the questions we wanted to answer was whether it is possible to

detect illicitly-gained Bitcoins based on the transaction history of a Bitcoin address.

Our analysis suggests that identifying these Bitcoins is getting signi�cantly di�cult

since cybercriminals have started to use evasive approaches to protect their privacy

(e.g., multiple independent Bitcoin addresses, small Bitcoin amounts, short activity

period, small transaction records) after receiving large volumes of Bitcoins from vic-

tims. One reason to use multiple independent addresses with small Bitcoin amounts

47

could be that concealing the source of thousands of illicitly-obtained Bitcoins is a

critical task if cybercriminals want to transfer the Bitcoins via recognized exchanges

without being noticed. In fact, this is the main evolution in employing Bitcoin in

ransomware attacks to make the potential tracing procedures more di�cult in the

Bitcoin network.

2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62Number of Transactions

0.0

0.2

0.4

0.6

0.8

1.0

No.o

f Tra

nsac

tions

Per

Add

ress

(CDF

)

Figure 2-6: The total number of transactions per Bitcoin address.

Our analysis on Bitcoin transactions shows that 84.46% of Bitcoin addresses had

no more than six transactions. Furthermore, a signi�cant fraction of these Bitcoin

addresses (68.93%) were active for at most 10 days. These addresses were directly used

to receive Bitcoins from victims. Another type of addresses had more transactions and

were active for a longer period of time (e.g., more than 10 days). These addresses were

used to aggregate the collected ransom fees. Figure 2-5 shows the CDF of number

of Bitcoin per Bitcoin address. In 48.9% of Bitcoin addresses that we analyzed, a

Bitcoin address received at most two Bitcoins. These transactions have occurred in

48

early steps of the attacks when two Bitcoins were worth roughly 200 dollars equal to

the ransom fee required by cybercriminals to send the decryption key.

2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62Number of Transactions

0.0

0.2

0.4

0.6

0.8

1.0

No.o

f Tra

nsac

tions

Per

Add

ress

(CDF

)

Figure 2-7: The duration of activity for Bitcoin addresses.

As shown in Figure 2-6, approximately 72.9% of Bitcoin transactions belong to Bit-

coin addresses with two transactions. The incoming transaction was made by victims

to pay the ransom and, the outgoing transaction was performed by cybercriminals.

The collected Bitcoins were transferred through tens of temporary intermediate ac-

counts or split into many small amounts in order to be recombined in a new account

later to decrease possibilities of tracing the money.

49

0-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 >40Duration of Bitcoin Addresses Activity (days)

0

10

20

30

40

50

60

Perc

enta

ge o

f Bitc

oin

Addr

esse

s (%

)

Figure 2-8: The duration of activity for Bitcoin addresses.

As provided in Figure 2-8, our observation also suggests that Bitcoin addresses

that were used to collect Bitcoins from victims have a relatively short duration of

activity. This is due to the fact that the accumulated Bitcoins had to be transferred

to other accounts within a few hours or a few days probably to use mix services and

conceal the source of the money.

2.5 Related Work

Ransomware and Underground Economy Various security vendors have re-

ported the threat potential of ransomware attacks based on the number of infections

that they observed [11, 86, 101]. The use of cryptography to mount extortion based

attacks was �rst introduced in [110]. Employing Microsoft Cryptographic API (MS

50

CAPI) calls to design cryptovirus samples was presented by Young [111]. Young

demonstrated how to use MS CAPI to generate keys and encrypt the user's data.

The �rst step to analyze speci�c ransomware families was made by Gazet by

analyzing three primitive ransomware families [43]. He concluded that while these

early families were designed for massive propagation, they did not ful�ll the basic

requirements (e.g., su�ciently long encryption keys) for mass extortion.

The presence of scareware as rogue security software has been also studied over

the past few years. Stone-Gross et al. performed an analysis of underground economy

of fake antivirus software. They built an economic model that showed how cyber-

criminals performed refunds and chargebacks in order to conceal their criminal nature

for a longer period of time [99]. Cova et al. provided an analysis of fake antivirus

structure and measured the number of victims and the pro�ts gained based on the

web servers used by several fake antivirus groups [36].

Bitcoin Privacy Bitcoin has also recently received considerable interest regard-

ing the security and anonymity in security research. Meiklejohn et al. developed a

clustering heuristic that was used to cluster Bitcoin addresses belonging to a partic-

ular user [74]. They discussed the potential anonymity in the Bitcoin protocol and

the actual anonymity achieved by users. Reid et al. constructed two graphs based on

publicly available transaction history [42]. They used the properties of these graphs

to illustrate how information leakage can be used to de-anonymize the system's users.

Using this technique, they described the �ow of stolen money from MyBitcoin. Re-

cently, Ron et al. performed an analysis over the user graph and provided an in-depth

analysis of the largest transactions in Bitcoin history [93]. In another work, Möser

performed an analysis of the anonymity and transaction graph of three Bitcoin mix

51

services. He found that all the three Bitcoin mix services had a distinct transaction

graph pattern, but some of them were more successful than others [80]. In order to

characterize the popularity of illicit goods, Christin performed an analysis by extract-

ing data from Silk Road marketplace [30]. Although the work does not examine the

Bitcoin block chain, it provides an estimation of the market value of such transactions.

A closer and concurrent work to our interest was performed by Spagnuolo et

al. that parsed the blockchain and clustered the Bitcoin addresses that were likely to

belong to certain users or groups [97]. They labeled the users based on the information

that was scraped from openly available resources. They were able to label Bitcoin

addresses on real-world cases such as Silk Road and Cryptolocker ransomware. We

also used public repositories to extract Bitcoin addresses that belong to cybercriminals

behind ransomware attacks. However, unlike Stagnuolo et al. work [97], our goal is to

characterize the Bitcoin addresses used for malicious intents based on the transaction

history rather than de-anonymizing the Bitcoin users.

2.6 Conclusions

In this work, we performed a long-term analysis of ransomware families with a special

focus on their destructive functionality. The characterization of ransomware attacks

was based on 1,359 ransomware samples among 15 families that have emerged over

the last few years. Our results show that a signi�cant number of ransomware families

share very similar characteristics in the core part of the attacks, but still lack reliable

destructive functions to successfully target victims' �les.

We also describe how a malicious process interacts with the �le system when a

compromised computer is under a ransomware attack. We observed that suspicious

52

�le system activity of multiple types of destructive ransomware families can be re-

liably monitored. When looking at the execution traces of the malware programs,

we observed that the way malicious processes generate requests to access �le system

was signi�cantly di�erent from benign processes. We also observed that di�erent

classes of ransomware attacks with multiple levels of sophistication share very similar

characteristics from �le system perspective due to the nature of these attacks. Un-

like recent discussions in security community about ransomware attacks, our analysis

suggests that implementing practical defense mechanisms is still possible, if we e�ec-

tively monitor the �le system activity for example the changes in Master File Table

(MFT) or the types of I/O Request Packets (IRP) generated on behalf of processes

to access the �le system. We propose a general methodology that allow us to detect

a signi�cant number of ransomware attacks without making any assumptions on how

samples attack users' �les.

53

Chapter 3

A Dynamic Analysis Approach to

Detecting Ransomware

3.1 Introduction

Malware continues to remain one of the most important security threats on the In-

ternet today. Recently, a speci�c form of malware called ransomware has become

very popular with cybercriminals. Although the concept of ransomware is not new �

such attacks were registered as far back as the end of the 1980s � the recent success

of ransomware has resulted in an increasing number of new families in the last few

years [11, 52, 56, 101, 103]. For example, CryptoWall 3.0 made headlines around

the world as a highly pro�table ransomware family, causing an estimated $325M in

damages [102]. As another example, the Sony ransomware attack [64] received large

media attention, and the U.S. government even took the o�cial position that North

Korea was behind the attack.

Ransomware operates in many di�erent ways, from simply locking the desktop

55

of the infected computer to encrypting all of its �les. Compared to traditional mal-

ware, ransomware exhibits behavioral di�erences. For example, traditional malware

typically aims to achieve stealth so it can collect banking credentials or keystrokes

without raising suspicion. In contrast, ransomware behavior is in direct opposition

to stealth, since the entire point of the attack is to openly notify the user that she is

infected.

Today, an important enabler for behavior-based malware detection is dynamic

analysis. These systems execute a captured malware sample in a controlled envi-

ronment, and record its behavior (e.g., system calls, API calls, and network tra�c).

Unfortunately, malware detection systems that focus on stealthy malware behavior

(e.g., suspicious operating system functionality for keylogging) might fail to detect

ransomware because this class of malicious code engages in activity that appears sim-

ilar to benign applications that use encryption or compression. Furthermore, these

systems are currently not well-suited for detecting the speci�c behaviors that ran-

somware engages in, as evidenced by misclassi�cations of ransomware families by AV

scanners [27, 92].

In this chapter, we present a novel dynamic analysis system that is designed to an-

alyze and detect ransomware attacks and model their behaviors. In our approach, the

system automatically creates an arti�cial, realistic execution environment and mon-

itors how ransomware interacts with that environment. Closely monitoring process

interactions with the �lesystem allows the system to precisely characterize crypto-

graphic ransomware behavior. In parallel, the system tracks changes to the com-

puter's desktop that indicates ransomware-like behavior. The key insight is that in

order to be successful, ransomware will need to access and tamper with a victim's �les

56

or desktop. Our automated approach, called Unveil, allows the system to analyze

many malware samples at a large scale, and to reliably detect and �ag those that

exhibit ransomware-like behavior. In addition, the system is able to provide insights

into how the ransomware operates, and how to automatically di�erentiate between

di�erent classes of ransomware.

We implemented a prototype of Unveil in Windows on top of the popular open

source malware analysis framework Cuckoo Sandbox [37]. Our system is implemented

through custom Windows kernel drivers that provide monitoring capabilities for the

�lesystem. Furthermore, we added components that run outside the sandbox to

monitor the user interface of the target computer system.

We performed a long-term study analyzing 148,223 recent general malware samples

in the wild. Our large-scale experiments show that Unveil was able to correctly

detect 13,637 ransomware samples from multiple families in live, real-world data feeds

with no false positives. Our evaluation also suggests that current malware analysis

systems may not yet have accurate behavioral models to detect di�erent classes of

ransomware attacks. For example, the system was able to correctly detect 7,572

ransomware samples that were previously unknown and undetected by traditional

AVs, but belonged to modern �le locker ransomware families. Unveil was also able

to detect a new type of ransomware that had not previously been reported by any

security company. This ransomware also did not show any malicious activity in

a modern sandboxing technology provided by a well-known anti-malware company,

while showing heavy �le encryption activity when analyzed by Unveil.

The high detection rate of our approach suggests that Unveil can complement

current malware analysis systems to quickly identify new ransomware samples in the

57

wild. Unveil can be easily deployed on any malware analysis system by simply

attaching to the �lesystem driver in the analysis environment.

In summary, this work makes the following contributions:

� We present a novel technique to detect ransomware known as �le lockers that

targets �les stored on a victim's computer. Our technique is based on moni-

toring system-wide �lesystem accesses in combination with the deployment of

automatically-generated arti�cial user environments for triggering ransomware.

� We present a novel technique to detect ransomware known as screen lockers.

Such ransomware prevents access to the computer system itself. Our technique

is based on detecting locked desktops using dissimilarity scores of screenshots

taken from the analysis system's desktop before, during, and after executing

the malware sample.

� We performed a large-scale evaluation to show that our approach can e�ectively

detect ransomware. We automatically detected and veri�ed 13,637 ransomware

samples from a dataset of 148,223 recent general malware. In addition, we

found one previously unknown ransomware sample that does not belong to any

previously reported family. Our evaluation demonstrates that our technique

works well in practice (achieving a true positive [TP] rate 96.3% at zero false

positives [FPs]), and is useful in automatically identifying ransomware samples

submitted to analysis and detection systems.

The rest of the chapter is structured as follows. In Section 3.2, we brie�y present

background information and explain di�erent classes of ransomware attacks. In Sec-

tion 3.3, we describe the architecture of Unveil and explain our detection approaches

58

for multiple types of ransomware attacks. In Section 3.4, we provide more details

about our dynamic analysis environment. In Section 3.5, we present the evaluation

results. Limitations of the approach are discussed in Section 3.6, while Section 3.7

presents related work. Finally, Section 3.8 concludes the chapter.

3.2 Background

Ransomware, like other classes of malware, uses a number of strategies to evade

detection, propagate, and attack users. For example, it can perform multi-infection

or process injection, ex�ltrate the user's information to a third party, encrypt �les, and

establish secure communication with C&C servers. Our detection approach assumes

that ransomware samples can and will use all of the techniques that other malware

samples may use. In addition, our system assumes that successful ransomware attacks

perform one or more of the following activities.

Persistent desktop message. After successfully performing a ransomware infec-

tion, the malicious program typically displays a message to the victim. This �ransom

note� informs the users that their computer has been �locked� and provides instruc-

tions on how to make a ransom payment to restore access. This ransom message

can be generated in di�erent ways. A popular technique is to call dedicated API

functions (e.g., CreateDesktop()) to create a new desktop and make it the default

con�guration to lock the victim out of the compromised system. Malware writers

can also use HTML or create other forms of persistent windows to display this mes-

sage. Displaying a persistent desktop message is a classic action in many ransomware

attacks.

Indiscriminate encryption and deletion of the user's private �les. A crypto-

59

style ransomware attack lists the victim's �les and aggressively encrypts any private

�les it discovers. Access is restricted by withholding the decryption key. Encryption

keys can be generated locally by the malware on the victim's computer, or remotely

on C&C servers, and then delivered to the compromised computer. An attacker can

use customized destructive functions, or Windows API functions to delete the original

user's �les. The attacker can also overwrite �les with the encrypted version, or use

secure deletion via the Windows Secure Deletion API.

Selective encryption and deletion of the user's private �les based on certain

attributes (e.g., size, date accessed, extension). In order to avoid detection,

a signi�cant number of ransomware samples encrypt a user's private �les selectively.

In the simplest form, the ransomware sample can list the �les based on the access

date. In more sophisticated scenarios, the malware could also open an application

(e.g., word.exe) and list recently accessed �les. The sample can also inject malicious

code into any Windows application to obtain this type of information (e.g., directly

reading process memory).

In this work, we address all of these scenarios where an adversary has already

compromised a system, and is able to launch arbitrary ransomware-related operations

on the user's �les or desktop.

3.3 Unveil Design

In this section, we describe our techniques for detecting multiple classes of ransomware

attacks. We refer the reader to Section 3.4 for details on the implementation details

of the prototype.

60

3.3.1 Detecting File Lockers

We �rst describe why our system creates a unique, arti�cial user environment in

each malware run. We then present the design of the �lesystem activity monitor and

describe how Unveil uses the output of the �lesystem monitor to detect ransomware.

Generating Arti�cial User Environments

Protecting malware analysis environments against �ngerprinting techniques is non-

trivial in a real-world deployment. Sophisticated malware authors exploit static fea-

tures inside analysis systems (e.g., name of a computer) and launch reconnaissance-

based attacks [70] to �ngerprint both public and private malware analysis systems.

The static features of analysis environments can be viewed as the Achilles' heel

of malware analysis systems. One static feature that can have a signi�cant impact

on the e�ectiveness of the malware analysis systems is the user data that can be

e�ectively used to �ngerprint the analysis environment. That is, even on bare-metal

environments where classic tricks such as virtualization checks are not possible, an

unrealistic looking user environment can be a telltale sign that the code is running in

a malware analysis system.

Intuitively, a possible approach to address such reconnaissance attacks is to build

the user environment in such a way that the user data is valid, real, and non-

deterministic in each malware run. These automatically-generated user environments

serve as an �enticing target� to encourage ransomware to attack the user's data while

at the same time preventing the possibility of being recognized by adversaries.

In practice, generating a user environment is a non-trivial problem, especially

if this is to be done automatically. This is because the content generator should

61

not allow the malware author to �ngerprint the automatically-generated user content

located in the analysis environment, and also determine that it does not belong to a

real user. We elaborate on how we automatically generate an arti�cial � yet realistic

� user environment for ransomware in each malware run in Section 3.4.1.

Filesystem Activity Monitor

The �lesystem monitor in Unveil has direct access to data bu�ers involved in I/O

requests, giving the system full visibility into all �lesystem modi�cations. Each I/O

operation contains the process name, timestamp, operation type, �lesystem path

and the pointers to the data bu�ers with the corresponding entropy information in

read/write requests. The generation of I/O requests happens at the lowest possible

layer to the �lesystem. For example, there are multiple ways to read, write, or list �les

in user-/kernel-mode, but all of these functions are ultimately converted to a sequence

of I/O requests. Whenever a user thread invokes an I/O API, an I/O request is

generated and is passed to the �lesystem driver. Figure 4-1 shows a high-level design

of Unveil in the Windows environment.

Unveil's monitor sets callbacks on all I/O requests to the �lesystem generated

on behalf of any user-mode processes. We note that for Unveil operations, it is

desirable to only set one callback per I/O request for performance reasons, and that

this also maintains full visibility into I/O operations. In Unveil, user-mode process

interactions with the �lesystem are formalized as access patterns. We consider access

62

patterns in terms of I/O traces, where a trace T is a sequence of ti such that

ti = ⟨P, F,O,E⟩ ,

P is the set of user-mode processes,

F is the set of available �les,

O is the set of I/O operations, and

E is the entropy of read or write data bu�ers.

For all of the �le locker ransomware samples that we studied, we empirically ob-

served that these samples issue I/O traces that exhibit distinctive, repetitive patterns.

This is due to the fact that these samples each use a single, speci�c strategy to deny

access to the user's �les. This attack strategy is accurately re�ected in the form of

I/O access patterns that are repeated for each �le when performing the attack. Con-

sequently, these I/O access patterns can be extracted as a distinctive I/O �ngerprint

for a particular family. We note that our approach mainly considers write or delete

requests. We elaborate on extracting I/O access patterns per �le in Section 3.3.1.

I/O Data Bu�er Entropy. For every read and write request to a �le captured

in an I/O trace, Unveil computes the entropy of the corresponding data bu�er.

Comparing the entropy of read and write requests to and from the same �le o�set

serves as an excellent indicator of crypto-ransomware behavior. This is due to the

common strategy to read in the original �le data, encrypt it, and overwrite the original

data with the encrypted version. The system uses Shannon entropy [69] for this

computation. In particular, assuming a uniform random distribution of bytes in a

63

Calculate Entropy

Identify Process

I/O Type

I/O Scheduler

FileSystem Driver

Physical Device

I/O Requests

I/O MonitorEXIT

file’s data Buffer

UNVEIL

User ModeKernel Mode

I/O MonitorENTER Record I/O

Request

Identify File OP

. . .

Process 1 Process 2 Process 3 Process N

read write delete write

I/O Access Monitor

Figure 3-1: Overview of the design of I/O access monitor in Unveil.

data block d, we have

H (d) = −n∑

i=1

log2 n

n.

Constructing Access Patterns. For each execution, after Unveil generates I/O

access traces for the sample, it sorts the I/O access requests based on �le names and

request timestamps. This allows the system to extract the I/O access sequence for

each �le in a given run, and check which processes accessed each �le. The key idea is

that after sorting the I/O access requests per �le, repetition can be observed in the

64

way I/O requests are generated on behalf of the malicious process.

The particular detection criterion used by the system to detect ransomware sam-

ples is to identify write and delete operations in I/O sequences in each malware run.

In a successful ransomware attack, the malicious process typically aims to encrypt,

overwrite, or delete user �les at some point during the attack. In Unveil, these I/O

request patterns raise an alarm, and are detected as suspicious �lesystem activity. We

studied di�erent �le locker ransomware samples across di�erent ransomware families.

Our analysis shows that although these attacks can be very di�erent in their attack

strategies (e.g., evasion techniques, key generation, key management, connecting to

C&C servers), they can be categorized into three main classes of attacks based on

their access requests.

overwrite

Open

Write

Close

read File x

Read

File x

Open

Read

Close

File x.locked

Open

Write

Close

encrypt delete File x

Open

Delete

Close

read File x

Open

Read

Close

File x.locked

Open

Write

Close

encrypt overwrite File x

Open

Read

Close

Write

(2)(1) (3)

Figure 3-2: Di�erent attack strategies among ransomware families with respect to I/Oaccess patterns.

Figure 3-2 shows the high-level access patterns for multiple ransomware families

we studied during our experiments. For example, the access pattern shown to the

left is indicative of Cryptolocker variants that have varying key lengths and desktop

locking techniques. However, its access pattern remains constant with respect to

family variants. We observed the same I/O activity for samples in the CryptoWall

family as well. While these families are identi�ed as two di�erent ransomware families,

65

since they use the same encryption functions to encrypt �les (i.e., the Microsoft

CryptoAPI), they have similar I/O patterns when they attack user �les.

As another example, in FileCoder family, the ransomware �rst creates a new �le,

reads data from a victim's �le, generates an encrypted version of the original data,

writes the encrypted data bu�er to the newly generated �le, and simply unlinks

the original user's �le (See Figure 3-2.2). In this class of �le locker ransomware, the

malware does not wipe the original �le's data from the disk. For attack approaches like

this, victims have a high chance of recovering their data without paying the ransom. In

the third approach (Figure 3-2.3), however, the ransomware creates a new encrypted

�le based on the original �le's data and then securely deletes the original �le's data

using either standard Windows APIs or custom overwriting implementations (e.g.,

such as CrypVault family).

3.3.2 Detecting Screen Lockers

The second core component of Unveil is aimed at detecting screen locker ran-

somware. The key insight behind this component is that the attacker must display a

ransom note to the victim in order to receive a payment. In most cases, the message

is prominently displayed, covering a signi�cant part, or all, of the display. As this ran-

som note is a virtual invariant of ransomware attacks, Unveil aims to automatically

detect the display of such notes.

The approach adopted by Unveil to detect screen locking ransomware is to mon-

itor the desktop of the victim machine, and to attempt to detect the display of a

ransom note. Similar to Grier et al. [45], we take automatic screenshots of the anal-

ysis desktop before and after the sample is executed. The screenshots are captured

66

from outside of the dynamic analysis environment to prevent potential tampering by

the malware. This series of screenshots is analyzed and compared using image analy-

sis methods to determine if a large part of the screen has suddenly changed between

captures. However, smaller changes in the image such as the location of the mouse

pointer, current date and time, new desktop icons, windows, and visual changes in

the task bar should be rejected as inconsequential.

In Unveil, we measure the structural similarity (SSIM) [106] of two screenshots

� before and after sample execution � by comparing local patterns of pixel intensities

in terms of both luminance and contrast as well as the structure of the two images.

Extracting structural information is based on the observation that pixels have strong

inter-dependencies � especially when they are spatially close. These dependencies

carry information about the structure of the objects in the image. After a success-

ful ransomware attack, the display of the ransom note often results in automatically

identi�able changes in the structural information of the screenshot (e.g., a large rect-

angular object covers a large part of the desktop). Therefore, the similarity of the

pre- and post-attack images decreases signi�cantly, and can be used as an indication

of ransomware.

In order to avoid false positives, Unveil only takes screenshots resulting from

persistent changes (i.e., changes that cannot be easily dismissed through user interac-

tion). The system �rst removes such transient changes (e.g., by automatically closing

open windows) before taking screenshots of the desktop. Using this preprocessing

step, ransomware-like applications that are developed for other purposes such as fake

AV are safely categorized as non-ransomware samples.

Unveil also extracts the text within the area where changes in the structure

67

of the image has occurred. The system extracts the text inside the selected area

and searches for speci�c keywords that are highly correlated with ransom notes

(e.g.,<lock, encrypt, desktop, decryption, key>).

Given two screenshots X and Y , we de�ne the structural similarity index of the

image contents of local windows xj and yj as

LocalSim (xj, yj) =(2µxµy + c1) (2σxy + c2)(

µ2x + µ2

y + c1) (

σ2x + σ2

y + c2)

where µx and µy are the mean intensity of xj and yj, and σx and σy are the standard

deviation as an estimate of xj and yj contrast and σxy is the covariance of xj and yj.

The local window size to compare the content of two images was set 8× 8. c1 and c2

are division stabilizer in the SSIM index formula [106]. We de�ne the overall similarity

between the two screenshots X and Y as the arithmetic mean of the similarity of the

image contents xj and yj at the jth local window where M is the number of local

windows of X and Y :

ImgSim (X, Y ) =1

M

M∑j=1

LocalSim (xj, yj) .

Since the overall similarity is always on [0, 1], the distance between X and Y is simply

de�ned as

Dist (X,Y ) = 1− ImgSim (X,Y ) .

Finally, we de�ne a similarity threshold τsim such that Unveil considers the sample

68

a potential screen locking ransomware if

Dist (X, Y ) > τsim.

Unveil then extracts the text within the image and searches for ransomware-related

words within the modi�ed area. Applying the image similarity test with the best

similarity threshold (see Section 3.5.2) gives us the highest recall with 100% precision

for the entire dataset.

3.4 Unveil Implementation

In this section, we describe the implementation details of a prototype of Unveil for

the Windows platform. We chose Windows for a proof-of-concept implementation

because it is currently the main target of ransomware attacks. We elaborate on

how Unveil automatically generates arti�cial, but realistic user environments for

each analysis run, how the system-wide monitoring was implemented, and how we

deployed the prototype of our system.

3.4.1 Generating User Environments

In each run, the user environment is made up of several forms of content such as

digital images, videos, audio �les, and documents that can be accessed during a

user Windows Session. The user content is automatically-generated according to the

following process:

For each �le extension from a space of possible extensions, a set of �les are gener-

ated where the number of �les for each extension is sampled from a uniform random

69

distribution for each sample execution. Each set of �les collectively forms a document

space for the sample execution environment. From a statistical perspective, docu-

ment spaces generated for each sample execution should be indistinguishable from

real user data. As an approximation to this ideal, randomly-selected numbers of �les

are generated per extension for each run according to the process described above.

In the following, we describe the additional properties that a document space

should have in order to complicate programmatic approaches that ransomware sam-

ples can potentially use to identify the automatically-generated user environment.

Valid Content. The user content generator creates real �les with valid headers and

content using standard libraries (e.g., python-docx, python-pptx, OpenSSL). Based

on empirical observation, we created four �le categories that a typical ransomware

sample tries to �nd and encrypt: documents, keys and licenses, �le archives, and

media. Document extensions include txt, doc(x), ppt(x), tex, xls(x), c, pdf and

py. Keys and license extensions include key, pem, crt, and cer. Archive extensions

include zip and rar �les. Finally, media extensions include jp(e)g, mp3, and avi.

For each sample execution, a subset of extensions are randomly selected and are used

to generate user content across the system.

In order to generate content that appears meaningful, we collected approximately

100,000 sentences by querying 500 English words in Google. For each query, we

collected the text from the �rst 30 search results to create a sentence list. We use

the collected sentences to generate the content for the user �les. We used the same

technique to create a word list to give a name to the user �les. The word list allows

us to create �les with variable name lengths that do not appear random. Clearly,

the problem with random content and name generation (e.g., xteyshtfqb.docx) is

70

that the attacker could programmatically calculate the entropy of the �le names

and contents to detect content that has been generated automatically. Hence, by

generating content that appears meaningful, we make it di�cult for the attacker to

�ngerprint the system and detect our generated �les.

File Paths. Note that the system is also careful to randomly generate the sup-

posed victim's directory structure. For example, directory names are also generated

based on meaningful words. Furthermore, the system also associates �les of certain

types with standard locations in the Windows directory structure for those �le types

(e.g., the system does not create document �les in a directory with image �les, but

rather under My Documents). The path length of user �les is also non-deterministic

and is generated randomly. In addition, each folder may have a set of sub-folders.

Consequently, the generated paths to user �les have variable depths relative to the

root folder.

Time Attributes. Another non-determinism strategy used by our approach is

to generate �les with di�erent creation, modi�cation, and access times. The �le time

attributes are sampled from a distribution of likely timestamps when creating the

�le. When the system creates �les with di�erent time attributes, the time attributes

of the containing folders are also updated automatically. In this case, the creation

time of the folder is the minimum of all creation times of �les and folders inside the

folder, while the modi�cation and access times are the maximum of the corresponding

timestamps.

While we have not observed ransomware samples that have attempted to use

�ngerprinting heuristics of the content of the analysis environment, the nondetermin-

ism strategies used by Unveil serve as a basis for making the analysis resilient to

71

�ngerprinting by design.

3.4.2 Filesystem Activity Monitor

Several techniques have been used to monitor sample �lesystem activity in malware

analysis environments. For example, �lesystem activity can be monitored by hooking

a list of relevant �lesystem API functions or relevant system calls using the System

Service Descriptor Table (SSDT). Unfortunately, these approaches are not suitable for

Unveil's detection approach for several reasons. First, API hooking can be bypassed

by simply copying a DLL containing the desired code and dynamically loading it into

the process' address space under a di�erent name. Stolen code [48, 51] and sliding

calls [51] are other examples of API hooking evasion that are common in the wild.

Furthermore, ransomware can use customized cryptosystems instead of the standard

APIs to bypass API hooking while encrypting user �les. Hooking system calls via

the SSDT also has other technical limitations. For example, it is prevented on 64-bit

systems due to Kernel Patch Protection (KPP). Furthermore, most SSDT functions

are undocumented and subject to change across di�erent versions of Windows.

Therefore, instead of API or system call hooking, Unveil monitors �lesystem

I/O activity using the Windows Filesystem Mini�lter Driver framework [78], which is

a standard kernel-based approach to achieving system-wide �lesystem monitoring in

multiple versions of Windows. The prototype consists of two main components for I/O

monitoring and retrieving logs of the entire system with approximately 2,800 SLOC

in C++. In Windows, I/O requests are represented by I/O Request Packets (IRPs).

Unveil's monitor sets callbacks on all I/O requests to the �lesystem generated on

behalf of user-mode processes. Basing Unveil's �lesystem monitor on a mini�lter

72

driver allows it to be located at the closest possible layer to the �lesystem with access

to nearly all objects of the operating system.

3.4.3 Desktop Lock Monitor

To identify desktop locking ransomware, screenshots are captured from outside of

the dynamic analysis environment to prevent potential tampering by the malware.

For dissimilarity testing, a python script implements the Structural Similarity Image

Metric (SSIM) as described in Section 3.3.2. Unveil �rst converts the images to

�oating point data, and then calculates parameters such as mean intensity µ using

Gaussian �ltering of the images' contents. We also used default values (k1 = 0.01

and k2 = 0.03) to obtain the values of c1 and c2 to calculate the structural similarity

score in local windows presented in Section 3.3.2.

The system also employs Tesseract-OCR [91], an open source OCR engine, to

extract text from the selected areas of the screenshots. To perform the analysis on

the extracted text within images, we collected more than 10,000 unique ransom notes

from di�erent ransomware families. We �rst clustered ransom notes based on the

family type and the visual appearance of the ransom notes. For each cluster, we

then extracted the ransom texts in the corresponding ransom notes and performed

pre-�ltering to remove unnecessary words within the text (e.g., articles, pronouns) to

avoid obvious false positive cases. The result is a word list for each family cluster

that can be used to identify ransom notes and furthermore label notes belonging to

a known ransomware family.

73

3.5 Evaluation

We evaluated Unveil with two experiments. The goal of the �rst experiment is to

demonstrate that the system can detect known ransomware samples, while the goal of

the second experiment is to demonstrate that Unveil can detect previously unknown

ransomware samples.

3.5.1 Experimental Setup

The Unveil prototype is built on top of Cuckoo Sandbox [37]. Cuckoo provides basic

services such as sample submission, managing multiple VMs, and performing simple

human interaction tasks such as simulating user input during an analysis. However,

in principle, Unveil could be implemented using any dynamic analysis system (e.g.,

BitBlaze [7], VxStream Sandbox [87]).

We evaluated Unveil using 56 VMs running Windows XP SP3 on a Ganeti clus-

ter based on Ubuntu 14.04 LTS. While Windows XP is not required by Unveil, it

was chosen because it is well-supported by Cuckoo sandbox. Each VM had multiple

NTFS drives. We took anti-evasion measures against popular tricks such as chang-

ing the IP address range and the MAC addresses of the VMs to prevent the VMs

from being �ngerprinted by malware authors. Furthermore, we permitted controlled

access to the Internet via a �ltered host-only adapter. In particular, the �ltering

allowed limited IRC, DNS, and HTTP tra�c so samples could communicate with

C&C servers. SMTP tra�c was redirected to a local honeypot to prevent spam, and

network bandwidth was limited to mitigate potential DoS attacks.

The operating system image inside the malware analysis system included typical

user data such as saved social networking credentials and a valid browsing history.

74

For each operating system image, multiple users were de�ned to run the experiments.

We also ran a script that emulated basic user activity while the malware sample

was running on the system, such as launching a browser and navigating to multiple

websites, or clicking on the desktop. This interaction was randomly-generated, but

was constant across runs. Each sample was executed in the analysis environment for

20 minutes. As described in Sections 3.3.1 and 3.3.2, user environments were gener-

ated for each run, �lesystem I/O traces were recorded, and pre- and post-execution

screenshots were captured. After each execution, the VM was rolled back to a clean

state to prevent any interference across executions. All experiments were performed

according to well-established experimental guidelines [94] for malware experiments.

3.5.2 Ground Truth (Labeled) Dataset

In this experiment, we evaluated the e�ectiveness of Unveil on a labeled dataset,

and ran di�erent screen locker samples to determine the best threshold value τsim for

the large-scale experiment.

We collected ransomware samples from public repositories [1, 4] and online forums

that share malware samples [3, 71]. We also received labeled ransomware samples from

two well-known anti-malware companies. In total, we collected 3,156 recent samples.

In order to make sure that those samples were indeed active ransomware, we ran

them in our test environment. We con�rmed 2,121 samples to be active ransomware

instances. After each run, we checked the �lesystem activity of each sample for any

signs of attacks on user data. If we did not see any malicious �lesystem activity, we

checked whether running the sample displayed a ransom note.

Table 4.3 describes the ransomware families we used in this experiment. We note

75

Family Type Samples

Cryptolocker crypto 33 (1.5%)CryptoWall crypto 42 (2.0%)CTB-Locker crypto 77 (3.6%)CrypVault crypto 21 (1.0%)CoinVault crypto 17 (0.8%)Filecoder crypto 19 (0.9%)TeslaCrypt crypto 39 (1.8%)Tox crypto 71 (3.3%)VirLock locker 67 (3.2%)Reveton locker 501 (23.6%)Tobfy locker 357 (16.8%)Urausy locker 877 (41.3%)

Total Samples - 2,121

Table 3.1: The list of ransomware families used in the �rst experiment.

that the dataset covers the majority of the current ransomware families in the wild. In

addition to the labeled ransomware dataset, we also created a dataset that consisted

of non-ransomware samples. These samples were submitted to the Anubis analysis

platform [47], and consisted of a collection of benign as well as malicious samples.

We selected 149 benign executables including applications that have ransomware-like

behavior such as secure deletion, encryption, and compression. A short list of these

applications are provided in Table 3.2. We also tested 384 non-ransomware malware

samples from 36 malware families to evaluate the false positive rate of Unveil.

Table 3.3 shows an example of I/O traces for CryptoWall 3.0 and CryptoWall 4.0

where the victim's �le is �rst read and then overwritten with an encrypted version.

The I/O access patterns of CryptoWall 4.0 samples to overwrite the content of the �les

are identical since they use the same cryptosystem. The main di�erence is that the

�lenames and extensions are modi�ed with random characters, probably to minimize

the chance of recovering the �les based on their names in the Master File Table (MFT)

76

Application Main Capability Version

7-zip Compression 15.06Winzip Compression 19.5WinRAR Compression 5.21DiskCryptor Encryption 1.1.846.118AESCrypt Encryption �Eraser Shredder 6.2.0.2969SDelete Shredder 1.61

Table 3.2: The list of benign applications that generate similar I/O access patternsto ransomware.

Run OP Proc FName O�set Entropy

CryptoWall 3 read explorer.exe document.cad [0, 4096) 5.21write explorer.exe document.cad [0, 4096) 7.04· · ·

CryptoWall 4 read explorer.exe project.cad [0, 4096) 5.21write explorer.exe project.cad [0, 4096) 7.11· · ·rename explorer.exe t67djkje.elkd8

Table 3.3: An example of I/O access in Unveil for CryptoWall 3.0 and CryptoWall 4.0.

in the NTFS �lesystem.

Filesystem Activity of Benign Applications with Potential Ransomware-

like Behavior

One question that arises is whether benign applications such as encryption or com-

pression programs might generate similar I/O request sequences, resulting in false

positives. Note that with benign applications, the original �le content is treated care-

fully since the ultimate goal is to generate an encrypted version of the original �le,

and not to restrict access to the �le. Therefore, the default mechanism in these appli-

cations is that the original �les remain intact even after encryption or compression.

If automatic deletion is deliberately activated by the user after the encryption, it can

77

Application OP Description

CrypVault read read low entropy bu�er from original �lewrite write high entropy bu�er to a new �le· · ·write overwrite the bu�er of the original �ledelete read attributes, delete the original �le

CryptoWall4 read read low entropy bu�erwrite overwrite with high entropy bu�er· · ·rename read attributes, rename the �les

SDelete write overwrite data bu�er· · ·delete read attributes, delete the �le

7-zip read read data bu�er from original �lewrite write data bu�er to a new �le· · ·

Table 3.4: I/O accesses for deletion and compression mechanisms in benign/maliciousapplications.

potentially result in a false positive (see Figure 3-2.2). However, in our approach, we

assume that the usual default behavior is exhibited and the original data is preserved.

We believe that this is a reasonable assumption, considering that we are building an

analysis system that will mainly analyze potentially suspicious samples captured and

submitted for analysis. Nevertheless, we investigated the I/O access patterns of be-

nign programs, shown in Table 3.4. The I/O traces indicate that these programs

exhibit distinguishable I/O access patterns as a result of their default behavior.

Benign applications might not necessarily perform encryption or deletion on user

�les, but can change the content of the �les. For example, updating the content

of a Microsoft PowerPoint �le (e.g., embedding images and media) generates I/O

requests similar to ransomware (see Figure 3-2.1). However, the key di�erence here

is that such applications usually generate I/O requests for a single �le at a time

78

t = 0.32

Figure 3-3: Precision-recall analysis of the tool.

and repetition of I/O requests does not occur over multiple user �les. Also, note that

benign applications typically do not arbitrarily encrypt, compress or modify user �les,

but rather need sophisticated input from users (e.g., �le names, keys, options, etc.).

Hence, most applications would simply exit, or expect some input when executed in

Unveil.

Similarity Threshold

We performed a precision-recall analysis to �nd the best similarity threshold τsim for

desktop locking detection. The best threshold value to discriminate between similar

and dissimilar screenshots should be de�ned in such a way that Unveil is be able to

detect screen locker ransomware while maintaining an optimal precision-recall rate.

79

Evaluation Results

Total Samples 148,223Detected Ransomware 13,637 (9.2%)Detection Rate 96.3%False Positives 0.0%New Detection 9,872 (72.2%)

Table 3.5: Unveil detection results.

Figure 3-3 shows empirical precision-recall results when varying τsim. As the �gure

shows, with τsim = 0.32, more than 97% of the ransomware samples across both screen

and �le locker samples are detected with 100% precision. In the second experiment,

we used this similarity threshold to detect screen locker ransomware in a malware

feed unknown to Unveil.

3.5.3 Detecting Zero-Day Ransomware

The main goal of the second experiment is to evaluate the accuracy of Unveil when

applied to a large dataset of recent real-world malware samples. We then compared

our detection results with those reported by AV scanners in VirusTotal.

This dataset was acquired from the daily malware feed provided by Anubis [47] to

security researchers. The samples were collected from May 18th 2015 until February

12th 2016. The feed is generated from the Anubis submission queue, which is fed

in turn by Internet users and security companies. Hence, before performing the

experiment, we �ltered the incoming Anubis samples by removing those that were

not obviously executable (e.g., PDFs, images). After this �ltering step, the dataset

contained 148,223 distinct samples. Each sample was then submitted to Unveil to

obtain I/O access traces and pre-/post-execution desktop image dissimilarity scores.

80

Detection Results

Table 3.5 shows the evaluation results of the second experiment. With the similarity

threshold τsim = 0.32, Unveil labeled 13,637 (9.2% of the dataset) samples in the

Anubis malware feed as being ransomware; these included both �le locker and desktop

locker samples.

Evaluation of False Positives. As we did not have a labeled ground truth in

the second experiment, we cannot provide an accurate precision-recall analysis, and

verifying the detection results is clearly challenging. For example, re-running samples

while checking for false positives is not feasible in all cases since samples may have

become inactive at the time of re-analysis (e.g., the C&C server might have been

taken down).

Hence, we used manual veri�cation of the detection results. That is, for the

samples that were detected as screen locker ransomware, we manually checked the

post-attack screenshots that were reported taken by Unveil. The combination of

structural similarity test and the OCR technique to extract the text provides a reliable

automatic detection for this class of ransomware. We con�rmed thatUnveil correctly

reported 4,936 samples that delivered a ransom note during the analysis.

Recall that Unveil reports a sample as a �le locker ransomware if the I/O access

pattern follows one of the three classes of ransomware attacks described in Figure 3-

2. For �le locker ransomware samples, we used the I/O reports for each sample. We

listed all the I/O activities on the �rst �ve user �les during that run and looked for

suspicious I/O activity such as requesting write and/or delete operations. Note that

the detection approach used in Unveil is only based on the I/O access pattern. We

81

do not check for changes in entropy in the detection phase and it is only used for our

evaluation.

If we �nd multiple write or delete I/O requests to the �rst �ve generated user �les

and also a signi�cant increase in the entropy between read and write data bu�ers at a

given �le o�set, or the creation of new high entropy �les, we con�rmed the detection

as a true positive. The creation of multiple new high entropy �les based on user �les

is a reliable sign of ransomware in our tests. For example, the malware sample that

uses secure deletion techniques may overwrite �les with low entropy data. However,

the malicious program �rst needs to generate an encrypted version of the original

�les. In any case, generating high entropy data raises an alarm in our evaluation.

By employing these two approaches and analyzing the results, we did not �nd any

false positives. There were a few cases that had signi�cant change in the structure of

the images. Our closer investigation revealed that the installed program generated a

large installation page, showed some unreadable characters in the window, and did not

close even if the button was clicked (i.e., non-functional buttons). In another case, the

program generated a large setup window, but it did not proceed due to a crash. These

cases produce a higher dissimilarity score than the threshold value. However, since

the extracted text within those particular windows did not contain any ransomware-

related contents, Unveil safely categorized them as being non-ransomware samples.

Evaluation of False Negatives. Determining false negative rates is a challenge

since manually checking 148,223 samples is not feasible. In the following, we provide

an approximation of false negatives for Unveil.

In our tests on the labeled dataset, false negatives mainly occurred in samples

that make persistent changes on the desktop, but since the dissimilarity score of pre-

82

/post-attack is less that τsim = 0.32, it is not detected as ransomware by Unveil.

Our analysis of labeled samples from multiple ransomware families (see Section3.5.2)

shows that these cases were mainly observed in samples with a similarity score be-

tween the interval [0.18, 0.32). This is because for lower similarity scores, changes

in the screenshots are negligible or small (e.g., Windows warning/error messages).

Consequently, in order to increase the chance of catching false negative cases, we

selected all the samples where their dissimilarity score was between [0.18, 0.32). This

decreases the size of potential desktop locker ransomware that were not detected by

Unveil to 4,642 samples. We manually checked the post-attack screenshots of these

samples, and found 377 desktop locker ransomware that Unveil was not able to

detect. Our analysis shows that the false negatives in desktop locker ransomware

resulted from samples in one ransomware family that generated a very transparent

ransom note with a dissimilarity score between [0.27, 0.31] that was di�cult to read.

For �le locker ransomware, we �rst removed the samples that were not detected

as malware by any of the AV scanners in VirusTotal after multiple resubmissions

in consecutive days (see Section 3.5.3). By applying this approach, we were able to

reduce the number of samples to check by 47%. Then, we applied a similar approach

we used as described above. We listed the �rst �ve user �les generated for that sample

run and checked whether any process requested write access to those �les. We also

checked the entropy of multiple data bu�ers. If we identi�ed write access with a

signi�cant increase in the entropy of data bu�ers compared to the entropy of data

bu�er in the read access for those �les, we report it as a false negative.

Our test shows that Unveil does not have any false negatives in �le locker ran-

somware samples. Consequently, we conclude that Unveil is able to detect multiple

83

classes of ransomware attacks with a low false positive rate (FPs = 0.0% at a TP =

96.3%).

Early Warning

One of the design goals of Unveil is to be able to automatically detect previously

unknown (i.e., zero-day) ransomware. In order to run this experiment, we did the

following. Once per day over the course of the experiment, we built a malware dataset

that was concurrently submitted to Unveil and VirusTotal. If a sample was detected

as ransomware by Unveil, we checked the VirusTotal (VT) detection results. In cases

where a ransomware sample was not detected by any VT scanner, we reported it as

a new detection.

In addition, we also measured the lag between a new detection by Unveil and

a VT detection. To that end, we created a dataset from the newly detected samples

submitted on days {1, 2, . . . , n− 1, n} and re-submitted these samples to see whether

the detection results changed. We considered the result of all 55 VT scanners in

this experiment. Since the number of scanners is relatively high, we de�ned a VT

detection ratio ρ as the ratio of the total number of scanners that identi�ed the

sample as ransomware or malware to the total number of scanners checked by VT. ρ

is therefore a value on the interval [0,1] where zero means that the sample was not

detected by any of the 55 VT scanners, and 1 means that all scanners reported the

sample as malware or ransomware. Since there is no standard labeling scheme for

malware in the AV industry, a scanner can label a sample using a completely di�erent

name from another scanner. Consequently, to avoid biased results, we consider the

labeling of a sample using any name as a successful detection.

84

0.0 0.2 0.4 0.6 0.8 1.00.00.10.20.30.40.50.60.7

Submission #1

0.0 0.2 0.4 0.6 0.8 1.00.000.050.100.150.200.25

Submission #2

0.0 0.2 0.4 0.6 0.8 1.00.000.050.100.150.20 Submission #3

0.0 0.2 0.4 0.6 0.8 1.00.000.050.100.150.20 Submission #4

0.0 0.2 0.4 0.6 0.8 1.00.000.050.100.150.20 Submission #5

0.0 0.2 0.4 0.6 0.8 1.00.000.050.100.150.200.25

Submission #6

Pollution Ratio

Dens

ity D

istr

ibut

ion

Figure 3-4: Evolution of VT scanner reports after six submissions.

In our experiment, we submitted the detected samples every day to see how the

VT detection ratio ρ changes over time. The distribution of ρ for each submission is

shown in Figure 3-4. Our analysis shows that ρ does not signi�cantly change after

a small number of subsequent submissions. For the �rst submission, 72.2% of the

ransomware samples detected by Unveil were not detected by any of the 55 VT

scanners. After a few submissions, ρ does not change signi�cantly, but generally was

concentrated either towards small or very large ratios. This means that after a few re-

submissions, either only a few scanners detected a sample, or almost all the scanners

detected the sample.

85

0 100 200 300 400 500 6000.000.050.100.150.200.250.30

QUERY OP

0 100 200 300 400 500 6000.000.050.100.150.200.25

READ OP

0 100 200 300 400 500 6000.000.020.040.060.080.100.120.140.16

WRITE OP

Distr

ibutio

n(%)

Analysis Time (Sec)

Userspace file fingerprinting

Creating a list of files

Periodic file encryption

Sleep Time

Sleep Time

Sleep Time

Figure 3-5: I/O activities of a previously unknown ransomware family detected by Unveil.

3.5.4 Case Study: Automated Detection of a New Ransomware

Family

In this section, we describe a new ransomware family, called SilentCrypt, that was

detected by Unveil during the experiments. After our system detected these samples

and submitted them to VirusTotal, several AV vendors picked up on them and also

started detecting them a couple of days later, con�rming the malice of the sample

that we automatically detected.

This family uses a unique and e�ective method to �ngerprint the runtime envi-

ronment of the analysis system. Unlike other malware samples that check for speci�c

artifacts such as registry keys, background processes, or platform-speci�c character-

istics, this family checks the private �les of a user to determine if the code is running

86

in an analysis environment. When the sample is executed, it �rst checks the number

of �les in the user's directories, and sends this list to the C&C server before starting

the attack.

Multiple online malware analysis systems such as malwr.com, Anubis, and a mod-

ern sandboxing technology provided by a well-known, anti-malware company did not

register any malicious activity for this sample. However, the sample showed heavy

encryption activity when analyzed by Unveil.

An analysis of the I/O activity of this sample revealed that this family �rst waited

for several minutes before attacking the victim's �les. Figure 3-5 shows the three main

I/O activities of one of the samples in this family. The sample traverses the current

user's main directories, and creates a list of �les and folders. If the sample receives

permission to attack from the C&C server, it begins encrypting the targeted �les.

To con�rm Unveil's alerts, we conducted a manual investigation over several days.

Our analysis concluded that the malicious activity is started only if user activity is

detected. Unlike other ransomware samples that immediately attack a victim's �les

when they are executed, this family only encrypt �les that have recently been opened

by the user while the malicious process is monitoring the environment. That is, the

malicious process reads the �le's data and overwrites it with encrypted data if the �le

is used. The �le name is then updated to "filename.extension.locked_forever"

after it has been encrypted.

Unveil was able to detect this family of ransomware automatically because it

was triggered after the system accessed some of the generated user �les as a part of

the user activity emulation scripts. Once we submitted the sample to VirusTotal, the

sample was picked up by other AV vendors (5/55) after �ve days with di�erent labels.

87

A well-known, sandboxing-based security company con�rmed our �ndings that the

malware sample was a new threat that they had not detected before. We provide an

anonymous video of a sample from this ransomware family in [10].

3.6 Discussion and Limitations

The evaluation in Section 3.5 demonstrates that Unveil achieves good, practical,

and useful detection results on a large, real-world dataset. Unfortunately, malware

authors continuously observe defensive advances and adapt their attacks accordingly.

In the following, we discuss limitations of Unveil and potential evasion strategies.

There is always the possibility that attackers will �nd ways to �ngerprint the au-

tomatically generated user environment and avoid it. However, this comes at a high

cost, and increases the di�culty bar for the attacker. For example, in desktop-locking

ransomware, malware can use heuristics to look for speci�c user interaction before

locking the desktop (e.g., waiting for multiple login events or counting the number of

user clicks). However, implementing these approaches can potentially make detection

easier since these approaches require hooking speci�c functions in the operating sys-

tem. The presence of these hooking behaviors are themselves suspicious and are used

by current malware analysis systems to detect di�erent classes of malware. Further-

more, these approaches delay launching the attack which increases the risk of being

detected by AV scanners on clients before a successful attack occurs.

Another possibility is that a malware might only encrypt a speci�c part of a �le

instead of aggressively encrypting the entire �le, or simply shu�e the �le content

using a speci�c pattern that makes the �les unreadable. Although we have not seen

any sample with these behaviors, developing such ransomware is quite possible. The

88

key idea is that in order to perform such activities, the malicious program should

open the �le with write permission and manipulate at least some data bu�ers of the

�le content. In any case, if the malicious program accesses the �les, Unveil will still

see this activity. There is no real reason for benign software to touch automatically

generated �les with write permission and modify the content. Consequently, such

activities will still be logged. Malware authors might use other techniques to notify

the victim and also evade the desktop lock monitor. As an example, the ransomware

may display the ransom note via video or audio �les rather than locking the desktop.

As we partially discussed, these approaches only make sense if the malware is able to

successfully encrypt user �les �rst. In this case, Unveil can identify those malicious

�lesystem access as discussed in Section 4.4.

We also believe that the current implementation of text extraction to detect desk-

top locker ransomware can be improved. We observed that the change in the structure

of the desktop screen-shots is enough to detect a large number of current ransomware

attacks since Unveil exploits the attacker's goal which is to ensure that the victims

see the ransom note. However, we believe that the text extraction module can be

improved to detect possible evasion techniques an attacker could use to generate the

ransom note (e.g., using uncommon words in the ransom text).

Clearly, there is always the possibility that an attacker will be able to �ngerprint

the dynamic analysis environment. For example, stalling code [61] has become in-

creasingly popular to prevent the dynamic analysis of a sample. Such code takes

longer to execute in a virtual environment, preventing execution from completing

during an analysis. Also, attackers can actively look for signs of dynamic analysis

(e.g., signs of execution in a VM such as well-known hard disk names). Note that

89

Unveil is agnostic as to the underlying dynamic analysis environment. Hence, as a

mitigation, Unveil can use a sandbox that is more resistant to these evasion tech-

niques(e.g., [61, 105]). The main contribution of Unveil is not the dynamic analysis

of malware, but rather the introduction of new techniques for the automated, speci�c

detection of ransomware during dynamic analysis.

Unveil runs within the kernel, and aims to detect user-level ransomware. As a

result, there is the risk that ransomware may run at the kernel level and thwart some

of the hooks Unveil uses to monitor the �lesystem. However, this would require

the ransomware to run with administrator privileges to load kernel code or exploit a

kernel vulnerability. Currently, most ransomware runs as user-level programs because

this is su�cient to carry out ransomware attacks. Kernel-level attacks would require

more sophistication, and would increase the di�culty bar for the attackers. Also, if

additional resilience is required, the kernel component of Unveil could be moved

outside of the analysis sandbox.

3.7 Related Work

Many approaches have been proposed to date that have aimed to improve the analysis

and detection of malware. A number of approaches have been proposed to describe

program behavior from analyzing byte patterns [68, 100, 95, 108] to transparently

running programs in malware analysis systems [5, 58, 57, 104]. Early steps to ana-

lyze and capture the main intent of a program focused on analysis of control �ow.

For example, Kruegel et al. [65] and Bruschi et al. [25] showed that by modeling

programs based on their instruction-level control �ow, it is possible to bypass some

forms of obfuscation. Similarly, Christodorescu et al. [32] used instruction-level con-

90

trol �ow to design obfuscation-resilient detection systems. Later work focused on an-

alyzing and detecting malware using higher-level semantic characterizations of their

runtime behavior derived from sequences of system call invocations and OS resource

accesses [59, 60, 31, 72, 98, 109].

Similar to our use of automatically-generated user content, decoys have been used

in the past to detect security breaches. For instance, the use of decoy resources has

been proposed to detect insider attacks [24, 112]. Recently, Juels et al. [50] used

honeywords to improve the security of hashed passwords. The authors show that

decoys can improve the security of hashed passwords since the attempt to use the

decoy password for logins results in an alarm. In other work, Nikiforakis et al. [83]

used decoy �les to detect illegally obtained data from �le hosting services.

There have also been some recent reports on the ransomware threat. For exam-

ple, security vendors have reported on the threat of potential of ransomware attacks

based on the number of infections that they have observed [103, 11, 101, 86]. A �rst

report on speci�c ransomware families was made by Gazet where the author analyzed

three ransomware families including Krotten and Gpcode [43]. The author concluded

that while these early families were designed for massive propagation, they did not

ful�ll the basic requirements for mass extortion (e.g., su�ciently long encryption

keys). Recently, Kharraz et al. [56] analyzed 15 ransomware families and provided an

evolution-based study of ransomware attacks. They performed an analysis of charging

methods and the use of Bitcoin for monetization. They proposed several high-level

mitigation strategies such as the use of decoy resources to detect suspicious �le access.

Their assumption is that every �lesystem access to delete or encrypt decoy resources

is malicious and should be reported. However, they did not implement any concrete

91

solution to detect or defend against these attacks.

We are not aware of any systems that have been proposed in the literature that

speci�cally aim to detect ransomware in the wild. In particular, in contrast to existing

work on generic malware detection, Unveil detects behavior speci�c to ransomware

(e.g., desktop locking, patterns of �lesystem accesses).

3.8 Conclusions

In this chapter we presented Unveil, a novel approach to detecting and analyzing

ransomware. Our system is the �rst in the literature to speci�cally identify typical

behavior of ransomware such as malicious encryption of �les and locking of user

desktops. These are behaviors that are di�cult for ransomware to hide or change.

The evaluation of Unveil shows that our approach was able to correctly detect

13,637 ransomware samples from multiple families in a real-world data feed with

zero false positives. In fact, Unveil outperformed all existing AV scanners and a

modern industrial sandboxing technology in detecting both super�cial and technically

sophisticated ransomware attacks. Among our �ndings was also a new ransomware

family that no security company had previously detected before we submitted it to

VirusTotal.

92

Chapter 4

Protecting End-Points from

Ransomware Attacks

4.1 Introduction

Ransomware continues to be one of the most important security threats on the In-

ternet. While ransomware is not a new concept (such attacks have been in the wild

since the last decade), the growing number of high-pro�le ransomware attacks [12,

29, 34, 44] has resulted in increasing concerns on how to defend against this class of

malware. In 2016, several public and private sectors including the healthcare industry

were impacted by ransomware [22, 13, 107]. Recently, US o�cials have also expressed

their concerns about ransomware [38, 49], and even asked the U.S. government to

focus on �ghting ransomware under the Cybersecurity National Action Plan [49].

In response to the increasing ransomware threat, users are often advised to cre-

ate backups of their critical data. Certainly, having a reliable data backup policy

minimizes the potential costs of being infected with ransomware, and is an impor-

93

tant part of the IT management process. However, the growing number of paying

victims [15, 81, 40] suggests that unsophisticated users � who are the main target

of these attacks � do not follow these recommendations, and easily become a paying

victim of ransomware. Hence, ransomware authors continue to create new attacks

and evolve their creations as evidenced by the emergence of more sophisticated ran-

somware every day [103, 11, 101, 86].

Law enforcement agencies and security �rms have recently launched a program

to assist ransomware victims in retrieving their data without paying ransom fees to

cybercriminals [84]. The main idea behind this partnership is that reverse engineers

analyze the cryptosystems used by the malware to extract secret keys or �nd design

�aws in the way the sample encrypts or deletes �les. While there are ransomware

families that are infamous for using weak cryptography [56, 28, 67], newer ransomware

variants, unfortunately, have learned from past mistakes by relying on strong crypto-

graphic primitives provided by standard cryptographic libraries. In response to the

increasing number of ransomware attacks, a desirable and complementary defense

would be to augment the operating system with transparent techniques that would

make the operating system resistant against ransomware-like behavior. However, an

endpoint approach to defend against unknown ransomware attacks would need to

immediately stop attacks once the ransomware starts destroying �les, and should be

able to recover any lost data.

This work presents a generic, real-time ransomware protection approach to over-

come the limitations of existing approaches with regard to detecting ransomware.

Our technique is based on two main components: First, an abstract characterization

of the behavior of a large class of current ransomware attacks is constructed. More

94

precisely, our technique applies the results of a long-term dynamic analysis to binary

objects to determine if a process matches the abstract model. A process is labeled as

malicious if it exhibits behaviors that match the abstract model. Second, Redemp-

tion employs a high-performance, high-integrity mechanism to protect and restore

all attacked �les by utilizing a transparent data bu�er to redirect access requests

while tracking the write contents.

In this work, we demonstrate that by augmenting the operating system with a set

of lightweight and generic techniques, which we collectively call Redemption, it is

possible to stop modern ransomware attacks without changing the semantics of the un-

derlying �le system's functionality, or performing signi�cant changes in the architec-

ture of the operating system. Our experiments on 29 contemporary ransomware fam-

ilies show that our approach can be successfully applied in an application-transparent

manner, and can signi�cantly enhance the current protection capabilities against ran-

somware (achieving a true positive [TP] rate of 100% at 0.8% false positives [FPs]).

Finally, we show that this goal can be achieved without a discernible performance

impact, or other changes to the way users interact with standard operating systems.

To summarize, we make the following contributions.

� We present a general approach to defending against unknown ransomware at-

tacks in a transparent manner. In this approach, access to user �les is mediated,

and privileged requests are redirected to a protected area, maintaining the con-

sistent state of user data.

� We show that e�cient ransomware protection with zero data loss is possible.

� We present a prototype implementation for Windows, and evaluate it with real

95

users to show that the system is able to protect user �les during an unknown

ransomware attack while imposing no discernible performance overhead.

The rest of the chapter is structured as follows. Section 4.2 presents related

work. In Section 4.3, we present the threat model. In Section 4.4, we elaborate

on the architecture of Redemption. In Section 4.6, we provide more details about

the implementation of the system. In Section 4.7, we present the evaluation results.

Limitations of the approach are discussed in Section 4.8. Finally, Section 4.9 concludes

the chapter.

4.2 Related Work

The �rst scienti�c study on ransomware was performed by Gazet [43] where he an-

alyzed three ransomware families and concluded that the incorporated techniques in

those samples did not ful�ll the basic requirements for mass extortion. The recent

resurgence of ransomware attacks has attracted the attention of several researchers

once more. Kharraz et al. [56] analyzed 15 ransomware families including desktop

locker and cryptographic ransomware, and provided an evolution-based study on ran-

somware attacks. The authors concluded that a signi�cant number of ransomware in

the wild has a very similar strategy to attack user �les, and can be recognized from

benign processes. In another work, Kharraz et al. [53] proposed Unveil, a dynamic

analysis system, that is speci�cally designed to assist reverse engineers to analyze the

intrinsic behavior of an arbitrary ransomware sample. Unveil is not an end-point so-

lution and no real end-user interaction was involved in their test. Redemption is an

end-point solution that aims di�erentiate between benign and malicious ransomware-

like access requests to the �le system.

96

Scaife et al. [85] proposed CryptoDrop which is built upon the premise that the

malicious process aggressively encrypts user �les. In the paper, as a limitation of

CryptoDrop, the authors state that the tool does not provide any recovery or minimal

data loss guarantees. Their approach is able to detect a ransomware attack after a

median of ten �le losses. Redemption does not have this limitation as it is designed

to protect the consistent state of the original �les by providing full data recovery if

an attack occurs. Hence, unlike CryptoDrop, Redemption guarantees minimal data

loss and is resistant to most of realistic evasion techniques that malware authors may

use in future.

Very recently, Continella et al. [35], and Kolodenker et al. [62] concurrently and in-

dependently proposed protection schemes to detect ransomware. Continella et al. [35]

proposed ShieldFS which has a similar goal to us. The authors also look at the

�le system layer to �nd typical ransomware activity. While ShieldFS is a signi�cant

improvement over the status quo, it would be desirable to complement it with a

more generic approach which is also resistant to unknown cryptographic functions.

Unlike ShieldFS, Redemption does not rely on cryptographic primitive identi�ca-

tion which can result in false positive cases. More importantly, this was a conscious

design choice to minimize the interference with the normal operation of processes,

minimize the risk of process crashes and avoid intrusive pop-up prompts which can

have noticeable usability side-e�ects.

Kolodenker et al. [62] proposed PayBreak which securely stores cryptographic

encryption keys in a key vault that is used to decrypt a�ected �les after a ransomware

attack. In fact, PayBreak intercepts calls to functions that provide cryptographic

operations, encrypts symmetric encryption keys, and stores the results in the key

97

vault. After a ransomware attack, the user can decrypt the key vault with his private

key and decrypt the �les without making any payments. The performance evaluation

of the system also shows that PayBreak imposes negligible overhead compared to a

reference platform. Similar to ShieldFS, PayBreak relies on identifying functions that

implement cryptographic primitives. As mentioned earlier, Redemption does not

depend on any hooking technique to identify cryptographic functions. Furthermore,

the detection accuracy of Redemption is not impacted by the type of packer a

ransomware family may use to evade common anti-malware systems. This makes

Redemption a more generic solution to the same problem space.

The evaluation of Redemption covers a signi�cantly larger number of ran-

somware families compared to [35, 85] and shows it can successfully identify unseen

ransomware attacks after observing a median of �ve exposed �les without any data

loss. Indeed, Redemption shares some similarity with CryptoDrop, ShieldFS, and

PayBreak due to the common characteristics of ransomware attacks. However, ex-

tracting such behavior of ransomware is not the main contribution of the work as they

have been comprehensively discussed in several security reports. Rather, Redemp-

tion is the introduction of a high performance, data loss free end-user protection

framework against ransomware that protects the consistent state of the entire user

space and can be used as an augmented service to the operating system. We are not

aware of any other scienti�c work on the protection against ransomware attacks.

4.3 Threat Model

In this work, we assume that ransomware can employ any standard, popular tech-

niques to attack machines similar to other types of malware. That is, ransomware

98

can employ several strategies to evade the detection phase, compromise vulnerable

machines, and attack the user �les. For example, a ransomware instance could be

directly started by the user, delivered by a drive-by download attack, or installed via

a simple dropper or a malicious email attachment.

We also assume that the malicious process can employ any techniques to gener-

ate the encryption key, use arbitrary encryption key lengths, or in general, utilize

any customized or standard cryptosystems to lock the �les. Ransomware can access

sensitive resources by generating new processes, or by injecting code into benign pro-

cesses (i.e., similarly to other classes of malware). Furthermore, we assume that a

user can install and run programs from arbitrary untrusted sources, and therefore,

that malicious code can execute with the privileges of the user. This can happen in

several scenarios. For instance, a user may install, execute and grant privileges to a

malicious application that claims to be a well-known legitimate application, but in

fact, delivers malicious payloads � including ransomware.

In addition, in this work, we also assume that the trusted computing base includes

the display module, OS kernel, and underlying software and hardware stack. There-

fore, we can safely assume that these components of the system are free of malicious

code, and that normal user-based access control prevents attackers from running ma-

licious code with superuser privileges. This is a fair assumption considering the fact

that ransomware attacks mainly occur in the user-mode.

4.4 Design Overview

In this section, we provide our design goals for Redemption. We refer the reader to

Section 4.6 for details of our prototype implementation. Redemption has two main

99

Redemption Monitor

1

2

6

5 4

3

1

2

Figure 4-1: Redemption mediates the access to the �le system and redirects each writerequest on the user �les to a protected area without changing the status of the original �le.

components. First, a lightweight kernel module that intercepts process interactions

and stores the event, and manages the changes in a protected area. Second, a user-

mode daemon, called behavioral monitor and noti�cation module, that assigns a

malice score to a process, and is used to notify the user about the potential malicious

behavior of a process.

Intercepting Access Requests. In order to implement a reliable dynamic access

control mechanism over user data, this part of the system should be implemented in

the kernel, and be able to mediate the access to the �le system. The prototype redi-

rects each write access request to the user �les to a protected area without changing

the status of the original �le. We explain more details on how we implemented the

write redirection semantics in Section 4.6.

Figure 1 presents an example that illustrates how access requests are processed.

In an unmodi�ed system, the request would succeed if the corresponding �le exists,

and as long as the process holds the permission. The system introduces the following

changes. (1) Redemption receives the request A from the application X to access

the �le F at the time t, (2) if At requests access with write or delete privilege to the �le

100

F , and the �le F resides in a user de�ned path, the Redemption's monitor is called,

(3) Redemption creates a corresponding �le in the protected area, called re�ected

�le, and handles the write requests. These changes are periodically �ushed to the

storage to ensure that they are physically available on the disk. The meta-data entry

of the corresponding �le is updated with the o�set and length of the data bu�er in the

I/O request after a successful data write at Step 3. (4) the malice score of the process

is updated, and is compared to a pre-con�gured threshold α. (5) the Redemption

monitor sends a noti�cation to the display monitor to alert the user depending on the

calculated malice score. (6) a success/failure noti�cation is generated, and is sent to

the system service manager.

Data Consistency. An important requirement for Redemption is to be able to

guarantee data consistency during the interaction of applications with the �le system.

A natural question that arises here is what happens if the end-user con�rms that the

suspicious operations on the �le that was detected by the system are in fact benign.

In this case, having a consistency model is essential to protect the benign changes to

the user �les without on-disk data corruption. The implementation of the consistency

policy should maintain the integrity properties the applications desire from the �le

system. Failure to do so can lead to corrupted application states and catastrophic data

loss. For this reason, the system does not change the �le system semantics that may

a�ect the crash guarantees that the �le system provides. To this end, Redemption

operates in three steps: (1) it reads the meta-data generated for the re�ected �le,

and creates write requests based on the changed data blocks, and changes the status

of these blocks to committed, (2) upon receiving the con�rmation noti�cation, the

system updates the meta-data of the re�ected �le from committed to con�rmed, and

101

(3) the re�ected �le is deleted from the protected area. Figure 4-2 brie�y illustrates

the steps involved in commiting the changes to the user data.

User Space

Kernel Space

SYSTEM SERVICES

REDEMPTION MONITOR

2

FILESYSTEMDRIVER

DISK

PROTECTEDAREA

Reflected Files

Offsets

Buffer Length

Metadata

Reading metadata of Reflected files

1

Creating requests and committing the changesto original files

3 Returning the result

Figure 4-2: The steps involved in commiting the benign changes to the �les.

Another question that arises here is how the system protects the consistency of

the original �le during the above-mentioned three-steps procedure if a system crash

occurs. In case of a crash, the system works as follows: (1) if data is committed (Step

1), but the corresponding meta-data is not updated (Step 2), the system treats the

change as incomplete, and discards the change as a rollback of an incomplete change.

This operation means that Step 2 is partially completed before a crash, so the system

repeats the Step 1, (2) If the meta-data of the re�ected �le is updated to con�rmed,

it means that the benign changes to the �le has been successfully committed to the

original �le. In this case, the re�ected �le is removed from the protected area. Note

that a malicious process may attack the Malice Score Calculation (MSC) function

102

by trying to keep the malice score of the process low while performing destructive

changes. We elaborate more on these scenarios in Section 4.8.

User Noti�cation. The trusted output that Redemption utilizes is a visual alert

shown whenever a malicious process is detected. We have designed the alert messages

to be displayed at the top of the screen to be easily noticeable. Since benign appli-

cations usually require sophisticated inputs (i.e., clicking on speci�c buttons, �lling

out the path prompt) from the user before performing any sensitive operation on the

�les, the user is highly likely to be present and interacting with the computer, making

it di�cult for her to miss an alert.

4.5 Detection Approach

As mentioned earlier, an important component of Redemption is to perform system-

wide application monitoring. For each process that requires privileged access to user

�les, we assign a malice score. The malice score of a process represents the risk

that the process exhibits ransomware behavior. That is, the malice score determines

whether the Redemption monitor should allow the process to access the �les, or

notify the user. In the following, we explain the features we used to calculate the

malice score of a process. The features mainly target content-based (i.e., changes

in the content of each �le) and behavior-based (i.e., cross-�le behavior of a process)

characteristics of ransomware attacks.

4.5.1 Content-based Features

Entropy Ratio of Data Blocks. For every read and write request to a �le, Re-

demption computes the entropy [69] of the corresponding data bu�ers in the I/O

103

traces similar to [53]. Comparing the entropy of read and write requests to and from

the same �le o�set serves as an excellent indicator of ransomware behavior. This is

due to the popular strategy of reading in the original �le data, encrypting it, and

writing the encrypted version.

File Content Overwrite. Redemption monitors how a process requests write

access to data blocks. In a typical ransomware attack, in order to minimize the

chance of recovering �les, the malicious process overwrites the content of the user

�les with random data. Our system increases the malice score of a process as the

process requests write access to di�erent parts of a �le. In fact, a process is assigned

a higher malice score if it overwrites all the content of the �les.

Delete Operation. If a process requests to delete a �le that belongs to the end-user,

it receives a higher malice score. Ransomware samples may not overwrite the data

block of the user �les directly, but rather generate an encrypted version of the �le,

and delete the original �le.

4.5.2 Behavior-based Features

Directory Traversal. During an attack, the malicious process often arbitrarily lists

user �les, and starts encrypting the �les with an encryption key. A process receives

a higher malice score if it is iterating over �les in a given directory. Note that a

typical benign encryption or compression program may also iterate over the �les in

a directory. However, the generated requests are usually for reading the content of

the �les, and the encrypted or compressed version of the �le is written in a di�erent

path. The intuition here is that the ransomware usually intends to lock as many �les

as possible to force the victim to pay.

104

Converting to a Speci�c File Type. A process receives a higher malice score

if it converts �les of di�ering types and extensions to a single known or unknown

�le type. The intuition here is that in many ransomware attacks, unlike most of the

benign applications that are speci�cally designed to operate on speci�c types of �les,

the malicious process targets all kinds of user �les. To this end, Redemption logs if

a process requests access to widely varying classes of �les (i.e., videos, images, docu-

ments). Note that accessing multiple �les with di�erent extensions is not necessarily

malicious. Representative examples include the media player to play .mp3 �les (au-

dio) as well as .avi (video) �les. However, such applications typically open the �les

with read permission, and more importantly, only generate one request in a short

period of time since the application requires speci�c inputs from the user. Hence,

the key insight is that a malicious ransomware process would overwrite or delete the

original �les.

Access Frequency. If a process frequently generates write requests to user �les,

we would give this process a higher malice score. We monitor δ � the time between

two consequent write access requests on two di�erent user �les. Our intuition is

that ransomware attacks programmatically list the �les and request access to �les.

Therefore, the δ between two write operations on two di�erent �les is not very long �

unlike benign applications that usually require some input from the user �rst in order

to perform the required operation.

4.5.3 Evaluating the Feature Set

Indeed, the assumption that all the features are equally important hardly holds true

in real world scenarios. Therefore, we performed a set of measurements to relax this

105

assumption. We used Recursive Feature Elimination (RFE) approach to determine

the signi�cance of each feature. To this end, the analysis started by incorporating

all the features and measuring the FP and TP rates. Then, in each step, a feature

with the minimum weight was removed and the FP and TP rates were calculated by

performing 10 fold cross-validation to quantify the contribution of each feature. The

assigned weights were then used as the coe�cient of the feature in the formula 4.1 in

Section 4.5.4.

Our experiments on several combinations of features shows that the highest false

positive rate is 5.9%, and is produced when Redemption only incorporates content-

based features (F1). The reason for this is that �le compression applications, when

con�gured to delete the original �les, are reported as false positives. During our

experiments, we also found out that in document editing programs such as Microsoft

Powerpoint or Microsoft Paint, if the user inserts a large image in the editing area,

the content-based features that monitor content traversal or payload entropy falsely

report the application as being anomalous. However, when behavior-based features

were incorporated, such programs do not receive a high anomaly score since there is

no cross-�le activities with write privilege similar to ransomware attacks. When all

the features are combined (i.e., F12), the minimum false positive rate (0.5% FP with

100% TPs) is produced on labeled dataset. Hence, we use the combination of all the

features in our system.

4.5.4 Malice Score Calculation (MSC) Function

The MSC function allows the system to identify the suspicious process and notify the

user when the process matches the abstract model. Given a process X, we assign a

106

malice score S to the process each time it requests privileged access to a user �le. If

the malice score S exceeds a pre-de�ned malice threshold α, it means that the process

exhibits abnormal behaviors. Hence, we suspend the process and inform the user to

con�rm the suspicious action. In the following, we provide more details on how we

determine the malice score for each process that requests privileged operations on

user �les:

(r1): The process that changes the entropy of the data blocks between a read and

a write request to a higher value receives a higher malice score. The required value

is calculated as an additive inverse of the entropy value of read and write ratio, and

resides on [0,1], meaning that the higher the value of entropy in the write operation,

the closer the value of the entropy to 1. If the entropy of the data block in write is

smaller than the read operation, we assign the value 0 to this feature.

(r2): If a process iterates over the content of a �le with write privilege, it will receive

a higher malice score. If the size of the �le A is sA, and yA is the total size of the

data blocks modi�ed by the process, the feature is calculated as yAsA

where the higher

the number of data blocks modi�ed by the process, the closer the value is to 1.

(r3): If a process requests to delete a �le, this behavior is marked as being suspicious.

If a process exhibits such I/O activities, the value 1 is assigned to r3.

(r4): Redemption monitors if the process traverses over the user �les with write

privilege, and computes the additive inverse of the number of privileged accesses to

unique �les in a given path. The output of the function resides on [0,1]. Given

a process X, the function assigns a higher malice score as X generates more write

requests to access �les in a given path. Here, write(X, fi) is the ith independent write

request generated by the process X on a given �le fi.

107

(r5): Given a set of document classes, Redemption monitors whether the process

requests write access to �les that belong to di�erent document classes. The �le A and

�le B belong to two di�erent document classes if the program that opens �le A cannot

take �le B as a valid input. For example, a docx and a pdf �le belong to two di�erent

document classes since a docx �le cannot be opened via a PDF editor program. We

assign the score 1 if the process performs cross-document access requests similar to

ransomware.

(r6): The system computes the elapsed time (δ) between two subsequent write re-

quests generated by a single process to access two di�erent �les. 1δrepresents the

access frequency. As the elapsed time between two write requests increases, the ac-

cess frequency decreases.

We de�ne the overall malice score of a process at time t by applying the weights of

individual features:

MSC(r) =

k∑i=1

wi × ri

k∑i=1

wi

(4.1)

where wi is the prede�ned weight for the feature i in the MSC function. The value

of wi is based on the experiment discussed in Section 4.5.3. The weights we used in

(1) are w1 = 0.9, w2 = 1.0, w3 = 0.6, w4 = 1.0, w5 = 0.7, w6 = 1.0.

Note that when Redemption is active, even when using all the combined fea-

tures, �le encryption or secure deletion applications are typically reported as being

suspicious. As mentioned earlier, such applications generate very similar requests to

access user �les as a ransomware does. For example, in a secure deletion application,

the process iterates over the entire content of the given �le with write privileges, and

108

writes random payloads on the contents. The same procedure is repeated over the

other �les in the path. Hence, such cases are reported to the user as violations, or

other inappropriate uses of their critical resources.

4.6 Implementation

In this section, we provide the implementation details of Redemption. Note that

our design is su�ciently general to be applied to any OS that is a potential target for

ransomware. However, we built our prototype for the Windows environment which

is the main target of current ransomware attacks today.

Monitoring Access Requests. Redemption must interpose on all privileged ac-

cesses to sensitive �les. The implementation of the system is based on the Windows

Kernel Development framework without any modi�cations on the underlying �le sys-

tem semantics. To this end, it su�ces on Windows to monitor the write or delete

requests from the I/O system to the base �le system driver. Furthermore, to guaran-

tee minimal data loss, Redemption redirects the write requests from the user �les

to the corresponding re�ected �les. The re�ected �les are implemented via sparse

�les on NTFS. In fact, the NTFS �le system does not allocate hard disk drive space

to re�ected �les except in regions where they contain non-zero data. When a process

requests to open a user �le, a sparse �le with the same name is created/opened in the

protected area. The sparse �les are created by calling the function FltFsControlFile

with the control code FSCTL_SET_SPARSE. The size of the �le is then set by calling

FltSetInformationFile that contains the size of the original �le.

Redemption updates the FileName �eld in the �le object of the create request

with the sparse �le. By doing this, the system redirects the operation to the re�ected

109

�le, and the corresponding handle is returned to the requesting process. The write

request is executed on the �le handle of the re�ected �le which has been returned to

the process at the opening of the �le. Each write request contains the o�set and the

length of the data block that the process wishes to write the data to.

If the write request is successfully performed by the system, the corresponding

meta-data of the re�ected �le (which is the o�set and the length of the modi�ed

regions of the original �le) is marked in the write requests. In our prototype, the

meta-data entry to represent the modi�ed regions is implemented via Reparse Points

provided by Microsoft � which is a collection of application-speci�c data � and is

interpreted by Redemption that sets the tags. When the system sets a reparse

point, a unique reparse tag is associated with it which is then used to identify the

o�set and the length of every change. The reparse point is set by calling FltTagFile

when the �le is created by Redemption. On subsequent accesses to the �le in the

protected area, the reparse data is parsed via FltFsControlFile with the appropriate

control code (i.e., FSCTL_GET_REPARSE_POINT). Hence, the redirection is achieved by

intercepting the original write request, performing the write, and completing the

original request while tracking the write contents.

The consistency of the data redirected to the sparse �les is an important design re-

quirement of the system. Therefore, it is required to perform frequent �ushing to avoid

potential user data loss. Indeed, this approach is not without a cost as multiple write

requests are required to ensure critical data is written to persistent media. To this end,

we use the Microsoft recommended approach by opening sparse �les for unbu�ered

I/O upon creation and enabling write-through caching via FILE_FLAG_NO_BUFFERING

and FILE_FLAG_WRITE_THROUGH �ags. In fact, with write-through caching enabled,

110

data is still written into the cache, but cache manager writes the data immediately

to disk rather than incurring a delay by using the lazy writer. Windows recommends

this approach as replacement for calling the FlushFileBuffer function after each

write which usually causes unnecessary performance penalties in such applications.

Behavioral Detection and Noti�cation Module. We implemented this module

as a user-mode service. This was a conscious design choice similar to the design of

most anti-malware solutions. Note that Microsoft o�cially supports the concept of

protected services, called Early Launch Anti-Malware (ELAM), to allow anti-malware

user-mode services to be launched as protected services. In fact, after the service is

launched as a protected service, Windows uses code integrity to only allow trusted

code to load into a protected service. Windows also protects these processes from code

injection and other attacks from admin processes [79]. If Redemption identi�es the

existence of a malicious process, it automatically terminates the malicious process.

4.7 Evaluation

The prototype of the Redemption supports all Windows platforms. In our experi-

ments, we used Windows 7 by simply attaching Redemption to the �le system. We

took popular anti-evasion measures similar to our experiments in Chapter 3. The

remainder of this section discusses how benign and malicious dataset were collected,

and how we conducted the experiments to evaluate the e�ectiveness of our approach.

4.7.1 Dataset

The ground truth dataset consists of �le system traces of manually con�rmed ran-

somware samples as well as more than 230 GB of data which contains the interaction

111

of benign processes with �le system on multiple machines. We used this dataset to

verify the e�ectiveness of Redemption, and to determine the best threshold value

to label a suspicious process.

Collecting Ransomware Samples. We collected ransomware samples from public

repositories [1, 4] that are updated on a daily basis, and online forums that share

malware samples [3, 71]. In total, we collected 9,432 recent samples, and we con�rmed

1174 of them to be active ransomware from 29 contemporary ransomware families. We

used 504 of the samples from 12 families in our training dataset. Table 4.2 describes

the dataset we used in this experiment.

Collecting Benign Applications. One of the challenges to test Redemption

was to collect su�cient amount of benign data, which can represent the realistic

use of �le system, for model training purposes. To test the proposed approach with

realistic workloads, we deployed a version of Redemption on �ve separate Windows

7 machines in two di�erent time slots each for seven days collecting more that 230 GB

of data. The users of the machines were advised to perform their daily activities on

their machines. Redemption operated in the monitoring mode, and did not collect

any sensitive user information such as credentials, browsing history or personal data.

The collected information only included the interaction of processes with the �le

system which was required to model benign interaction with the �le system. All the

extracted data was anonymized before performing any further experiments. Based

on the collected dataset, we created a pool of application traces that consisted of

65 benign executables including applications that exhibit ransomware-like behavior

such as secure deletion, encryption, and compression. The application pool consisted

of document editors (e.g., Microsoft Word), audio/video editors (e.g., Microsoft Live

112

Movie Maker, Movavi Video Editor), �le compression tools (e.g., Zip, WinRAR), �le

encryption tools (e.g., AxCrypt, AESCrypt), and popular web browsers (e.g., Firefox,

Chrome). Due to space limitation, we provided a sub set of benign applications we

used in our analysis in Table 4.1.

4.7.2 Detection Results

As discussed in Section 4.4, one of the design requirements of the system is to produce

low false positives, and to minimize the number of unnecessary noti�cations for the

user. To this end, the system employs a threshold value to determine when an end-

user should be noti�ed about the suspicious behavior of a process.

We tested a large set of benign as well as ransomware samples on a Redemption

enabled machine. As depicted in Table 4.1 and Table 4.2, the median score of benign

applications is signi�cantly lower than ransomware samples. For �le encryption pro-

grams such as AxCrypt which are speci�cally designed to protect the privacy of the

users, the original �le is overwritten with random data once the encrypted version is

generated. In this case, Redemption reports the action as being malicious � which,

in fact, is a false positive. Unfortunately, such false positive cases are inevitable since

these programs are exhibiting the exact behavior that a typical ransomware exhibits.

In such cases, Redemption informs the end-user and asks for a manual con�rmation.

Given these corner cases, we select the malice score as α = 0.12 where the system

achieves the best detection and false positive rates (FPs = 0.5% at a TP = 100%).

Figure 4-3 represents the false positive and true positive rates as a function of the

malice score on the labeled dataset. This malice threshold is still signi�cantly lower

than the minimum malice score of all the ransomware families in the dataset as pro-

113

vided in Table 4.2. The table also shows the median �le recovery rate. As depicted,

Redemption detects a malicious process and successfully recovers encrypted data

after observing on average four �les. Our experiment on the dataset also showed that

7 GB storage is su�ciently large for the protected area in order to enforce the data

consistency policy.

Figure 4-3: TP/FP analysis of Redemption based on the best threshold value.

Testing with Known/Unknown Samples. In addition to the 10-fold cross vali-

dation on 504 samples, we also tested Redemption with unknown benign and ma-

licious dataset. The tests included 29 ransomware families which 57% of them were

not presented in the training dataset. We also incorporated the �le system traces of

benign processes in the second time slot as discussed in Section 4.7.1 as the unseen

benign dataset in this test. Table 4.3 represents the list of ransomware families we

114

used in our experiments. This table also shows the datasets that were used in prior

work [35, 85, 62]. In this experiment, we used the malice threshold α = 0.12 similar to

the previous experiment and manually checked the detection results to measure the

FP and TP rates. The detection results in this set of experiments is (TPs = 100%

at 0.8% FPs). Note that the number of FP cases depends on the value of malice

threshold. We selected this conservative value to be able to detect all the possible

ransomware behaviors. Indeed, observing realistic work loads on a larger group of

machines can lead to a more comprehensive model, more accurate malice threshold

calibration, and ultimately lower FP rates. However, our experiments on 677 ran-

somware samples from 29 ransomware families show that Redemption is able to

detect the malicious process in all the 29 families by observing a median of 5 �les.

We suspect the di�erence in the number of �les is due to di�erence in the size of the

�les being attacked. In fact, this is a very promising result since the detection rate of

the system did not change by adding unknown ransomware families which do not nec-

essarily follow the same attack techniques (i.e., using di�erent cryptosystems). The

results of this experiment also shows that the number of exposed �les to ransomware

does not change signi�cantly if Redemption is not trained with unseen ransomware

families. This result clearly implies that the system can detect a signi�cant number

of unseen ransomware attacks.

4.7.3 Disk I/O and File System Benchmarks

In order to evaluate the disk I/O and �le system performance of Redemption, we

used IOzone [9], a well-known �le system benchmark tool for Windows. To this end,

we �rst generated 100× 512 MB �les to test the throughput of block write, rewrite,

115

Table 4.1: A list of Benign application and their malice scores.

Program Min. Score Max. Score

Adobe Photoshop 0.032 0.088AESCrypt 0.37 0.72AxCrypt 0.31 0.75Adobe PDF reader 0.0 0.0Adobe PDF Pro 0.031 0.039Google Chrome 0.037 0.044Internet Explorer 0.035 0.045Matlab 0.038 0.92MS Words 0.041 0.089MS PowerPoint 0.025 0.102MS Excel 0.017 0.019VLC Player 0.0 0.0Vera Crypt 0.33 0.71WinRAR 0.0 0.16Windows Backup 0.0 0.0Windows paintit 0.029 0.083SDelete 0.283 0.638Skype 0.011 0.013Spotify 0.01 0.011Sumatra PDF 0.022 0.041Zip 0.0 0.16

Malice Score Median 0.027 0.0885

and read operations. Next, we tested the standard �le system operations by creating

and accessing 50,200 �les, each containing 1 MB of data in multiple directories. We

ran IOzone as a normal process. Then, for having a comparison, we repeated all

the experiments 10 times, and calculated the average scores to get the �nal results.

We wrote a script in AutoIt [8] to automate the tasks.The results of our �ndings are

summarized in Table 4.4.

The experiments show that Redemption performs well when issuing heavy reads

and writes, and imposes an overhead of 2.8% and 3.4%, respectively. However, rewrite

116

Table 4.2: A list of ransomware families and their malice scores.

Family Samples Min. Score Max. Score File Recovery

Cerber 33 0.41 0.73 5Cryptolocker 50 0.36 0.77 4CryptoWall3 39 0.4 0.79 6CryptXXX 46 0.49 0.71 3CTB-Locker 53 0.38 0.75 7CrypVault 36 0.53 0.73 3CoinVault 39 0.42 0.69 4Filecoder 54 0.52 0.66 5GpCode 45 0.52 0.76 2TeslaCrypt 37 0.43 0.79 4Virlock 29 0.51 0.72 3SilentCrypt 43 0.31 0.59 9

Total Samples 504 - - -Score Median - 0.43 0.73 -File RecoveryMedian

- - - 4

and create operations can experience slowdowns ranging from 7% to 9% when dealing

with a large number of small �les. In fact, creating the re�ected �les and redirecting

the write requests to the protected area are the main reasons of this performance

hit under high workloads. These results also suggest that Redemption might not

be suitable for workloads involving many small �les such as compiling large software

projects. However, note that such heavy workloads do not represent the deployment

cases Redemption is designed to target (i.e., protecting the end host of a typical

117

Family Redemption CryptoDrop [85] ShieldFS [35] PayBreak [62]Samples/FA Samples/FA Samples Samples

Almalocker - - - 1Androm - - - 2Cerber 30/6 - - 1Chimera - - - 1CoinVault 19/5 - - -Critroni 16/6 - 17 -Crowti 22/8 - - -CryptoDefense 42/7 18/6.5 6 -CryptoLocker(copycat) - 2/20 - -Cryptolocker 29/4 31/10 20 33CryptoFortess 12/7 2/14 - 2CryptoWall 29/5 8/10 8 7CrypWall - - - 4CrypVault 26/3 - - -CryptXXX 45/3 - - -CryptMIC 7/3 - - -CTB-Locker 33/6 122/29 - -DirtyDecrypt 8/3 - 3 -DXXD - - - 2Filecoder 34/5 72/10 - -GpCode 45/3 13/22 - 2HDDCryptor 13/5 - - -Jigsaw 12/4 - - -Locky 21/2 - 154 7MarsJokes - - - 1MBL Advisory 12/4 1/9 - -Petya 32/5 - - -PayCrypt - - 3 -PokemonGo - - - 1PoshCoder 17/4 1/10 - -TeslaCrypt 39/6 149/10 73 4Thor Locky - - - 1TorrentLocker 21/6 1/3 12 -Tox 15/7 - - 9Troldesh - - - 5Virlock 29/7 20/8 - 4Razy - - - 3SamSam - - - 4SilentCrypt 43/8 - - -Xorist 14/7 51/3 - -Ransom-FUE - 1/19 - -WannaCry 7/5 - - -ZeroLocker 5/8 - 1 -

Total Samples (Families) 677(29) 492(15) 305(11) 107(20)File Attacked/Recovered(FA/FR) Median 5/5 10/0 - -

Table 4.3: The ransomware families used to test Redemption and other proposed tech-niques.

user that surfs the web and engages in productivity activities such as writing text

and sending emails).

Another important question that arises here is that how many �les should be main-

tained in the protected area when Redemption is active. In fact, as the protected

area is su�ciently large, the system can maintain several �les without committing

them to the disk and updating the original �les. However, this approach may not

be desirable in scenarios where several read operations may occur immediately after

write operations (i.e., database). More speci�cally, in these scenarios, Redemption,

in addition to write requests, Redemption should also redirect read operations to

the protected area which is not ideal from usability perspective. To this end, we also

118

performed an I/O benchmarking on the protected area by requesting write access to

�les, updating the �les, and committing the changes to the protected area without

updating the original �les. We created a script to immediately generate read requests

to access updated �les. The I/O benchmark on the protected area shows that the

performance overhead for read operations is less than 3.1% when 100 �les with me-

dian �le size of 17.4 MB are maintained in the protected area. This number of �les

is signi�cantly larger than the maximum number of �les Redemption needs to ob-

serve to identify the suspicious process. Note that we consider the scenarios where

read operations are requested immediately after write operations to exercise the redi-

rection mechanism under high loads. Based on this performance benchmarking, we

conclude that read redirection mechanism does not impose a signi�cant overhead as

we �rst expected. In the following, we demonstrate that Redemption incurs min-

imal performance overhead when executing more realistic workloads for our target

audience.

4.7.4 Real-world Application Testing

To obtain measurable performance indicators to characterize the overhead of Re-

demption, we created micro-benchmarks that exercise the critical performance paths

of Redemption. Note that developing benchmarks and custom test cases requires

careful consideration of factors that might impact the runtime measurements. For

example, a major challenge we had to tackle was automating the testing of desktop

applications with graphical user interfaces. In order to perform the tests as identical

as possible on the standard and Redemption-enabled machines, we wrote scripts in

AutoIt to interact with each application while monitoring their performance impact.

119

Table 4.4: Disk I/O performance in a stan-dard and a Redemption-protected host.

OperationOriginal Redemption

Performance Performance Overhead(%)

Write 112,456.25 KB/s 110094.67KB/s 3.4%Rewrite 68,457.57 KB/s 62501.76 KB/s 8.7%Read 114,124.78 KB/s 112070.53 KB/s 2.8%Create 12,785 �les/s 11,852 �les/s 7.3%

Table 4.5: Runtime overhead of Redemp-tion on a set of end-point applications

Application Original (s) Redemption (s) Overhead (%)

AESCrypt 165.55 173.28 4.67%AxCrypt 182.4 191.72 5.11%Chrome 66.19 67.02 1.25%IE 68.58 69.73 1.67%Media Player 118.2 118.78 0.49%MS Paint 134.5 138.91 3.28%MS Word 182.17 187.84 3.11%SDelete 219.4 231.0 5.29%Vera Crypt 187.5 196.46 4.78%Winzip 139.7 141.39 1.21%WinRAR 160.8 163.12 1.44%zip 127.8 129.32 1.19%

Average - - 2.6%

To this end, we called the application within the script, and waited for 5 seconds for

the program window to appear. We then automatically checked whether the GUI

of the application is the active window. The script forced the control's window of

the application to be on top. We then started interacting with the edit control and

other parts of the programs to exercise the core features of the applications using

the handle returned by the AutoIt script. Similarly to the previous experiment, we

repeated each test 10 times. We present the average runtimes in Table 4.5.

In our experiments, the overhead of protecting a system from ransomware was

under 6% in every test case, and, on average, running applications took only 2.6%

longer to complete their tasks. These results demonstrate that Redemption is ef-

�cient, and that it should not detract from the user experience. These experiments

also support that Redemption can provide real time protection against ransomware

without a signi�cant performance impact. We must stress that if Redemption is de-

ployed on machines with a primarily I/O bound workload, lower performance should

be expected as indicated by the benchmark in Section 4.7.3.

120

4.7.5 Usability Experiments

We performed a user study experiment with 28 participants to test the usability of

Redemption. We submitted and received IRB waiver for our usability experiments

from the o�ce of Human Subject Research Protection (HSRP). The goal of the us-

ability test is to determine whether the system provides transparent monitoring, and

also to evaluate how end-users deal with our visual alerts. The participants were from

di�erent majors at the authors' institution. Participants were recruited by asking for

volunteers to help test a security tool. In order to avoid the e�ects of priming, the

participants were not informed about the key functionality of Redemption. The

recruitment requirement was that the participants are familiar with text editors and

web browsers so that they could perform the given tasks correctly. All the experi-

ments were conducted using two identical Windows 7 virtual machines enabled with

Redemption on two laptops. The virtual machines were provided a controlled In-

ternet access as described in Section 4.7. Redemption was con�gured to be in the

protection mode on the entire data space generated for the test user account. A ran-

somware sample was automatically started at a random time to observe how the user

interacts with Redemption during a ransomware attack. After each experiment, the

virtual machines were rolled back to the default state. No personal information was

collected from the participants at any point of the experiments.

We asked the participants to perform three tasks to evaluate di�erent aspects

of the system. The �rst task was to work with an instance of Microsoft Word and

PowerPoint on the test machines running Redemption. The experiment observer

asked the participants to compare this process with their previous experience of using

Microsoft Word and PowerPoint and rate the di�culty involved in interacting with

121

the test setup on a 5-point Likert scale.

In the second task, the participants were asked to encrypt a folder containing mul-

tiple �les with AxCrypt on the Redemption-enabled machine. This action caused

a visual alert to be displayed to the participant that the operation is suspended, and

ask the user to con�rm or deny the action. The participants were asked to explain

why they con�rmed or denied the action and the reason behind their decision.

In the last task, the participants were asked to perform a speci�c search on the

Internet. While they were pre-occupied with the task, the ransomware sample was

automatically started. This action was blocked by Redemption and caused another

visual alert to be displayed. Similar to the second task, the experiment observer

monitored how participants handled the alert.

At the end of the �rst phase of the experiment, all 28 participants found the expe-

rience to be identical to using Microsoft Word and PowerPoint on their own machines.

This �nding empirically con�rms that Redemption is transparent to the users. In

the second experiment, 26 participants con�rmed the action. Another 2 noticed the

alert, but denied the operation so no �le was encrypted. In the third phase, all the 28

participants noticed the visual alert, and none of the users con�rmed the operation.

The participants explained that they were not sure why they received this visual alert,

and could not verify the operation. These results con�rm that Redemption visual

alerts are able to draw all participants' attention while they are occupied with other

tasks, and are e�ective in protecting the user data. Furthermore, the experiments

clearly imply that end-users are more likely to recognize the presence of suspicious

operations on their sensitive data using Redemption indicators. To con�rm statis-

tical signi�cance, we performed a hypothesis test where the null hypothesis is that

122

Redemption's indicators do not assist in identifying suspicious operations during

ransomware attacks, while the alternative hypothesis is that Redemption's ran-

somware indicators do assist in identifying such destructive actions. Using a paired

t-test, we obtain a p-value of 4.9491× 10−7, su�cient to reject the null hypothesis at

a 1% signi�cance level.

4.8 Discussion and Limitations

Unfortunately, malware research is an arms race. Therefore, there is always the possi-

bility that malware developers �nd heuristics to bypass the detection on the analysis

systems, or on end-user machines. In the following, we discuss possible evasion sce-

narios that can be used by malware authors, and how Redemption addresses them.

Attacking Redemption's Monitor.

Note that the interaction of any user-mode process as well as kernel mode drivers

with the �le system is managed by Windows I/O manager which is responsible for

generating appropriate I/O requests. Since every access in any form should be �rst

submitted to the I/O manager, and Redemption registers callbacks to all the I/O

requests, bypassing Redemption's monitor is not possible in the user-mode. Fur-

thermore, note that direct access to the disk or volume is prohibited by Windows

from Windows Vista [76] for user-mode applications in order to protect �le system's

integrity. Therefore, any other form of requests to access the �les is not possible in

the user-mode, and is guaranteed by the operating system.

Attackers may be able to use social engineering techniques and frustrate users by

creating fake alert messages � accusing a browser to be a ransomware � and forcing

the user to turn o� Redemption. We believe these scenarios are possible. However,

123

note that such social engineering attacks are well-known security problems and target

all end-point security solutions including our tool. Defending against such attacks

depends more on the security awareness of users and is out of scope of this work.

Attacking the Malice Score Calculation Function.

An attacker may also target the malice calculation function, and try to keep the

malice score of the process lower than the threshold. For example, an attacker can

generate code that performs selective content overwrite, use a low entropy payload

for content overwrite, or launch periodic �le destruction. If an attacker employs any

one of these techniques by itself, the malice score becomes lower, but the malicious

action would still be distinguishable. For example, if the �le content is overwritten

with low entropy payload, the process receives a lower malice score. However, since

the process overwrites all the content of a �le with a low-entropy payload, it is itself

suspicious, and would be reported to the user.

We believe that the worst case scenario would be if an attacker employs all the

three techniques simultaneously to bypass the malice score calculation function. This

is a fair assumption since developing such a malware is straightforward. However, note

that in order to launch a successful ransomware attack, and force the victim to pay

the ransom fee, the malicious program needs to attack more than a �le � preferably

all the �les on the system. Hence, even if the malicious program employs all of the

bypassing techniques, it requires some sort of iteration with write permission over

the user �les. This action would still be seen and captured by Redemption. In this

particular case, a malicious program can successfully encrypt a single user �le, but

the subsequent write attempt on another �le would be reported to the user for the

con�rmation if the write request occurs within a pre-de�ned six hour period after the

124

�rst attempt. This means a ransomware can successfully encrypt a user �le every six

hours. We should stress that, in this particular scenario, the system cannot guarantee

zero data loss. However, the system signi�cantly decreases the e�ectiveness of the

attack since the number of �les encrypted per day is very small.

Furthermore, since these approaches incur a considerable delay to launch a suc-

cessful attack, they also increase the risk of being detected by AV scanners on the

end-point before encrypting a large number of �les, and forcing the user to pay. Con-

sequently, developing such stealthy ransomware may not be as pro�table as current

ransomware attack strategies where the entire point of the attack is to encrypt as

many �les as possible in a short period of time and request money. An attacker may

also avoid performing user �le encryption, and only lock the desktop once installed.

This approach can make the end-user machine inaccessible. However, such changes

are not persistent, and regaining access to the machine is signi�cantly easier, and is

out of the scope of this work.

4.9 Conclusions

In this work, we proposed a generic approach, called Redemption, to defend against

ransomware on the end-host. We show that by incorporating the prototype of Re-

demption as an augmented service to the operating system, it is possible to suc-

cessfully stop ransomware attacks on end-user machines. We showed that the system

incurs modest overhead, averaging 2.6% for realistic workloads. Furthermore, Re-

demption does not require explicit application support or any other preconditions

to actively protect users against unknown ransomware attacks. We provide an anony-

mous video of Redemption in action in [14], and hope that the concepts we propose

125

will be useful for end-point protection providers.

126

Chapter 5

Conclusions

Malware research is an adversarial �eld. Adversaries strive for developing new tech-

niques to evade common detection techniques and successfully run their attacks. The

evolving nature of these attacks require the security community to constantly monitor

new trends and techniques used by attackers, and develop novel defense mechanism.

However, to ensure that the proposed solutions can be successfully incorporated in

the defense side, the proposed techniques should enhance the modern detection tech-

niques, improve anti-evasion capabilities of the current solutions while maintaining

usability and imposing low overhead. As most of today's ransomware attacks mainly

target end-users, any anti-ransomware technique needs to satisfy these requirements

in order to achieve widespread adoption either as an end-point solution or a tool for

research and analysis purposes.

In this thesis, in light of the above considerations, our proposed solutions are

designed to satisfy these requirements. We illustrated that our proposed solutions can

enhance the state-the-of-the-art techniques while being resistant to common evasion

techniques and imposing low overhead to the underlying systems.

127

In Chapter 2, we looked at 1,359 current ransomware samples from 15 ransomware

families by monitoring how these attacks target users. We developed a kernel driver

that can be attached to the �lesystem, and monitor how a malicious process tar-

gets user data. Our analysis showed that, unlike most of the security reports on

these attacks, the malicious payloads work very similarly among di�erent classes of

ransomware attacks, and developing a defense mechanism against these attacks is

possible.

In Chapter 3, we investigated the challenges to analyze ransomware attacks, and

developed an analysis environment to detect and study unknown ransomware at-

tacks. We proposed Unveil, a sandbox designed speci�cally to detect and analysis

ransomware, which assists reverse engineers to get intrinsic insights of the internal

behavior of ransomware attacks.

In Chapter 4, discuss the shortcomings of current solutions to protect end-users

from ransomware attacks, and proposed Redemption, an end-point solution, to

detect ransomware attacks while achieving zero data loss. We provided a generic

system architecture and implemented a prototype for Microsoft Windows operating

system. We performed several experiments to show that Redemption meets the

requirements we discussed earlier in this chapter.

Acknowledgments The research presented in Chapter 2 is based on author's pre-

viously published work:

Amin Kharraz, William Robertson, Davide Balzarotti, Leyla Bilge, Engin Kirda,

Cutting the Gordian Knot: A Look Under the Hood of Ransomware Attacks, 12th

International Conference on Detection of Intrusions and Malware Vulnerability As-

sessment (DIMVA). Milan, Italy, 2015.

128

The research presented in Chapter 3 is based on author's previously published

paper:

Amin Kharraz, Sajjad Arshad, Collin Muliner, William Robertson, Engin Kirda,

UNVEIL: A Large-Scale, Automated Approach to Detecting Ransomware, USENIX

2016. Austin, Texas, August 2016.

The research presented in Chapter 4 is based on author's previously published

paper:

Amin Kharraz, Engin Kirda, Redemption: Real-time Protection Against Ransomware

at End-Hosts,The 20th International Symposium on Research on Attacks, Intrusions

and Defenses (RAID 2017). Atlanta, Georgia, September 2017.

The author was also involved in a set of research papers which are not directly re-

lated to this thesis [19, 20, 55], and would like to thank the collaborators for producing

great work in this area.

129

Bibliography

[1] Minotaur Analysis - Malware Repository. minotauranalysis.com.

[2] VX Vault - Online Repository of Malware Samples. vxvault.siri-urz.net.

[3] Malware Tips - Your Security Advisor. http://malwaretips.com/forums/

virus-exchange.104/.

[4] MalwareBlackList - Online Repository of Malicious URLs. http://www.

malwareblacklist.com.

[5] Proof-of-concept Automated Baremetal Malware Analysis Framework. https://code.google.com/p/nvmtrace/.

[6] Police ransomware threat assessment. Europol Public Information, 2014.

[7] BitBlaze Malware Analysis Service. http://bitblaze.cs.berkeley.edu/,2016.

[8] AutoIt. https://www.autoitscript.com/site/autoit/, 2016.

[9] IOzone Filesystem Benchmark. www.iozone.org, 2016.

[10] SilentCrypt: A new ransomware family. https://www.youtube.com/watch?

v=qiASKA4BMck, 2016.

[11] Anand Ajjan. Ransomware: Next-Generation Fake Antivirus.http://www.sophos.com/en-us/medialibrary/PDFs/technicalpapers/

SophosRansomwareFakeAntivirus.pdf, 2013.

[12] Alex Hern. Major sites including New York Times and BBC hit by ransomwaremalvertising. https://www.theguardian.com/technology/2016/mar/16/

major-sites-new-york-times-bbc-ransomware-malvertising, 2016.

[13] Alex Hern. Ransomware threat on the rise as almost 40 percent of bussi-nesses attacked. https://www.theguardian.com/technology/2016/aug/03/

ransomware-threat-on-the-rise-as-40-of-businesses-attacked, 2016.

[14] Amin Kharraz. A brief demo on how Redemption operates. https://www.

youtube.com/watch?v=iuEgFVz7a7g, 2016.

131

[15] Andrew Dalton. Hospital paid 17K ransom to hackers of its computer network.http://bigstory.ap.org/article/d89e63ffea8b46d98583bfe06cf2c5af/

hospital-paid-17k-ransom-hackers-its-computer-network, 2016.

[16] Manos Antonakakis, Roberto Perdisci, David Dagon, Wenke Lee, and NickFeamster. Building a dynamic reputation system for dns. In Proceedings of the19th USENIX conference on Security, pages 18�18. USENIX Association, 2010.

[17] Manos Antonakakis, Roberto Perdisci, Wenke Lee, Nikolaos Vasiloglou, II, andDavid Dagon. Detecting malware domains at the upper dns hierarchy. InProceedings of the 20th USENIX Conference on Security, SEC'11, pages 27�27,Berkeley, CA, USA, 2011. USENIX Association.

[18] Manos Antonakakis, Roberto Perdisci, Yacin Nadji, Nikolaos Vasiloglou, SaeedAbu-Nimeh, Wenke Lee, and David Dagon. From throw-away tra�c to bots:Detecting the rise of dga-based malware. In Presented as part of the 21stUSENIX Security Symposium (USENIX Security 12), pages 491�506, Bellevue,WA, 2012. USENIX.

[19] Sajjad Arshad, Amin Kharraz, and William Robertson. Identifying extension-based ad injection via �ne-grained web content provenance. In Proceedingsof the 19th International Symposium on Research in Attacks, Intrusions andDefenses (RAID), 9 2016.

[20] Sajjad Arshad, Amin Kharraz, and William Robertson. Include me out: In-browser detection of malicious third-party content inclusions. In Proceedings ofthe 20th International Conference on Financial Cryptography and Data Security(FC), 2 2016.

[21] Ulrich Bayer, Christopher Kruegel, and Engin Kirda. TTAnalyze: A Tool forAnalyzing Malware. In Proceedings of the European Institute for ComputerAntivirus Research Annual Conference, April 2006.

[22] BBC News. University pays 20,000 Dollars to ransomware hackers. http:

//www.bbc.com/news/technology-36478650, 2016.

[23] Blockchain.info. Bitcoin Block Explorer. https://blockchain.info.

[24] Brian M Bowen, Shlomo Hershkop, Angelos D Keromytis, and Salvatore JStolfo. Baiting inside attackers using decoy documents. Springer, 2009.

[25] Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. Detecting self-mutating malware using control-�ow graph matching. In Detection of Intrusionsand Malware & Vulnerability Assessment, pages 129�143. Springer, 2006.

[26] Brian Carrier. File System Forensic Analysis. Addison-Wesley Professional,2005.

132

[27] Catalin Cimpanu. Breaking Bad Ransomware Completely Unde-tected by VirusTotal. http://http://news.softpedia.com/news/

breaking-bad-ransomware-goes-completely-undetected-by-virustotal-493265.

shtml, 2015.

[28] Charlie Osborne. Researchers launch another salvo atCryptXXX ransomware. http://www.zdnet.com/article/

researchers-launch-another-salvo-at-cryptxxx-ransomware/, 2016.

[29] Chris Francescani. Ransomware Hackers Blackmail U.S.Police Departments. http://www.cnbc.com/2016/04/26/

ransomware-hackers-blackmail-us-police-departments.html, 2016.

[30] Nicolas Christin. Traveling the silk road: A measurement analysis of a largeanonymous online marketplace. In Proceedings of WWW 2013, May 2013.

[31] Mihai Christodorescu, Somesh Jha, and Christopher Kruegel. Mining speci�ca-tions of malicious behavior. In Proceedings of the 1st India software engineeringconference, pages 5�14. ACM, 2008.

[32] Mihai Christodorescu, Somesh Jha, Sanjit A Seshia, Dawn Song, and Randal EBryant. Semantics-aware malware detection. In Security and Privacy, 2005IEEE Symposium on, pages 32�46. IEEE, 2005.

[33] Cisco, Inc. Ransomware on Steroids: Cryptowall 2.0. http://blogs.cisco.

com/security/talos/cryptowall-2, 2015.

[34] Connor Mannion. Three U.S. Hospitals Hit in String of Ran-somware Attacks. http://www.nbcnews.com/tech/security/

three-u-s-hospitals-hit-string-ransomware-attacks-n544366, 2016.

[35] Andrea Continella, Alessandro Guagnelli, Giovanni Zingaro, GiulioDe Pasquale, Alessandro Barenghi, Stefano Zanero, and Federico Maggi.Shieldfs: a self-healing, ransomware-aware �lesystem. In Proceedings of the32nd Annual Conference on Computer Security Applications, pages 336�347.ACM, 2016.

[36] Marco Cova, Corrado Leita, Olivier Thonnard, Angelos D. Keromytis, and MarcDacier. An Analysis of Rogue AV Campaigns. In Proceedings of the Interna-tional Conference on Recent Advances in Intrusion Detection, pages 442�463,2010.

[37] Cuckoo Foundation. Cuckoo Sandbox: Automated Malware Analysis. www.

cuckoosandbox.org, 2015.

[38] Dan Whitcomb. California lawmakers take step towardoutlawing ransomware. http://www.reuters.com/article/

us-california-ransomware-idUSKCN0X92PA, 2016.

133

[39] Dell SecureWorks. Cryptolocker Ransomware. http://www.secureworks.com/cyber-threat-intelligence/threats/cryptolocker-ransomware/, 2014.

[40] Dell SecureWorks. University of Calgary paid 20K in ran-somware attack. http://www.cbc.ca/news/canada/calgary/

university-calgary-ransomware-cyberattack-1.3620979, 2016.

[41] Brian Donohue. Reveton Ransomware Adds Pass-word Purloining Function. http://threatpost.com/

reveton-ransomeware-adds-password-purloining-\\function/100712,2013.

[42] Reid Fergal and Harrigan Martin. An analysis of anonymity in the bitcoinsystem. In Security and Privacy in Social Networks, 2012.

[43] Alexandre Gazet. Comparative analysis of various ransomware virii. Journalin Computer Virology, 6(1):77�90, February 2010.

[44] Grefgory Wolf. 8 High Pro�le Ransomware Attacks YouMay Not Have Heard Of. https://www.linkedin.com/pulse/

8-high-profile-ransomware-attacks-you-may-have-heard-gregory-wolf,2016.

[45] Chris Grier, Lucas Ballard, Juan Caballero, Neha Chachra, Christian J Dietrich,Kirill Levchenko, Panayiotis Mavrommatis, Damon McCoy, Antonio Nappa,Andreas Pitsillidis, et al. Manufacturing compromise: the emergence of exploit-as-a-service. In Proceedings of the 2012 ACM conference on Computer andcommunications security, pages 821�832, 2012.

[46] Greg Hoglund and Jamie Butler. Rootkits: Subverting the Windows Kernel.Addison-Wesley Professional, 2005.

[47] International Secure System Lab. Anubis - Malware Analysis for UnknownBinaries. https://anubis.iseclab.org/, 2015.

[48] Jashua Tully. An Anti-Reverse Engineering Guide. http://www.codeproject.com/Articles/30815/An-Anti-Reverse-Engineering-Guide#StolenBytes,2008.

[49] Jerry Zremski. New York Senator Seeks to Com-bat Ransomware. http://www.govtech.com/security/

New-York-Senator-Seeks-to-Combat-Ransomware.html, 2016.

[50] Ari Juels and Ronald L Rivest. Honeywords: Making password-cracking de-tectable. In Proceedings of the 2013 ACM SIGSAC conference on Computer &communications security, pages 145�160. ACM, 2013.

134

[51] Yuhei Kawakoya, Makoto Iwamura, Eitaro Shioji, and Takeo Hariu. Api chaser:Anti-analysis resistant malware analyzer. In Research in Attacks, Intrusions,and Defenses, pages 123�143. Springer, 2013.

[52] Kevin Savage, Peter Coogan, Hon Lau. the Evolution of Ransomware.http://www.symantec.com/content/en/us/enterprise/media/security_

response/whitepapers/the-evolution-of-ransomware.pdf, 2015.

[53] Amin Kharraz, Sajjad Arshad, Collin Mulliner, William Robertson, and En-gin Kirda. UNVEIL: A Large-Scale, Automated Approach to Detecting Ran-somware. In 25th USENIX Security Symposium, 2016.

[54] Amin Kharraz and Engin Kirda. Redemption: Real-time protection againstransomware at end-hosts. In Proceedings of the 20th International Symposiumon Research in Attacks, Intrusions and Defenses (RAID), 9 2017.

[55] Amin Kharraz, Engin Kirda, William Robertson, Davide Balzarotti, and Au-relien Francillon. Optical Delusions: A Study of Malicious QR Codes in theWild. In Proceedings of the IEEE/IFIP International Conference on DependableSystems and Networks (DSN), 06 2014.

[56] Amin Kharraz, William Robertson, Davide Balzarotti, Leyla Bilge, and EnginKirda. Cutting the Gordian Knot: A Look Under the Hood of RansomwareAttacks. In Conference on Detection of Intrusions and Malware & VulnerabilityAssessment (DIMVA), 07 2015.

[57] Dhilung Kirat, Giovanni Vigna, and Christopher Kruegel. Barebox: e�cientmalware analysis on bare-metal. In Proceedings of the 27th Annual ComputerSecurity Applications Conference, pages 403�412. ACM, 2011.

[58] Dhilung Kirat, Giovanni Vigna, and Christopher Kruegel. Barecloud: Bare-metal analysis-based evasive malware detection. In 23rd USENIX SecuritySymposium (USENIX Security 14), pages 287�301. USENIX Association, 2014.

[59] Engin Kirda, Christopher Kruegel, Greg Banks, Giovanni Vigna, and RichardKemmerer. Behavior-based spyware detection. In Usenix Security, volume 6,2006.

[60] Clemens Kolbitsch, Paolo Milani Comparetti, Christopher Kruegel, EnginKirda, Xiao-yong Zhou, and XiaoFeng Wang. E�ective and e�cient malwaredetection at the end host. In USENIX security symposium, pages 351�366,2009.

[61] Clemens Kolbitsch, Engin Kirda, and Christopher Kruegel. The power of pro-crastination: detection and mitigation of execution-stalling malicious code. InProceedings of the 18th ACM conference on Computer and communications se-curity, pages 285�296. ACM, 2011.

135

[62] Eugene Kolodenker, William Koch, Gianluca Stringhini, and Manuel Egele.Paybreak: Defense against cryptographic ransomware. In Proceedings of the2017 ACM on Asia Conference on Computer and Communications Security,ASIA CCS '17, pages 599�611, New York, NY, USA, 2017. ACM.

[63] Brian Krebs. Inside a Reveton Ransomware Op-eration. http://krebsonsecurity.com/2012/08/

inside-a-reveton-ransomware-operation/, 2012.

[64] Brian Krebs. FBI: North Korea to Blame forSony Hack. http://krebsonsecurity.com/2014/12/

fbi-north-korea-to-blame-for-sony-hack/, 2014.

[65] Christopher Kruegel, Engin Kirda, Darren Mutz, William Robertson, and Gio-vanni Vigna. Polymorphic worm detection using structural information of exe-cutables. In Recent Advances in Intrusion Detection, pages 207�226. Springer,2006.

[66] Andrea Lanzi, Davide Balzarotti, Christopher Kruegel, Mihai Christodorescu,and Engin Kirda. Accessminer: Using system-centric models for malware pro-tection. In Proceedings of the 17th ACM Conference on Computer and Com-munications Security, CCS '10, pages 399�412. ACM, 2010.

[67] Lawrance Abrams. TeslaCrypt Decrypted: Flaw in TeslaCrypt allows Victim'sto Recover their Files. http://www.bleepingcomputer.com/news/security/teslacrypt-decrypted-flaw-in-teslacrypt-allows-victims-to-recover\

\-their-files/, 2016.

[68] Wei-Jen Li, Ke Wang, Salvatore J Stolfo, and Benjamin Herzog. Fileprints:Identifying �le types by n-gram analysis. In Information Assurance Workshop,2005. IAW'05. Proceedings from the Sixth Annual IEEE SMC, pages 64�71.IEEE, 2005.

[69] Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Trans-actions on Information theory, 37:145�151, 1991.

[70] Martina Lindorfer, Clemens Kolbitsch, and Paolo Milani Comparetti. Detectingenvironment-sensitive malware. In Recent Advances in Intrusion Detection,pages 338�357. Springer, 2011.

[71] Malware Don't Need Co�ee. Guess who's back again ? Cryp-towall 3.0. http://malware.dontneedcoffee.com/2015/01/

guess-whos-back-again-cryptowall-30.html, 2015.

[72] Lorenzo Martignoni, Elizabeth Stinson, Matt Fredrikson, Somesh Jha, andJohn C Mitchell. A layered architecture for detecting malicious behaviors. InRecent Advances in Intrusion Detection, pages 78�97. Springer, 2008.

136

[73] McAfee Labs. McAfee Labs 2017 Threat Predictions Report. https://www.

mcafee.com/us/resources/reports/rp-threats-predictions-2017.pdf,2017.

[74] Sarah Meiklejohn, Marjori Pomarole, Grant Jordan, Kirill Levchenko, DamonMcCoy, Geo�rey M. Voelker, and Stefan Savage. A �stful of bitcoins: Charac-terizing payments among men with no names. In Proceedings of the 2013 Con-ference on Internet Measurement Conference, IMC '13, pages 127�140, 2013.

[75] Michael Mimoso. Leaked NSA Exploit SpreadingRansomware WorldWide. https://threatpost.com/

leaked-nsa-exploit-spreading-ransomware-worldwide/125654/, 2017.

[76] Microsoft, Inc. Blocking Direct Write Operations to Volumes andDisks. https://msdn.microsoft.com/en-us/library/windows/hardware/

ff551353(v=vs.85).aspx.

[77] Microsoft, Inc. Microsoft Security Intelegence Report Vol. 16. http://www.

microsoft.com/security/sir/default.aspx, 2013.

[78] Microsoft, Inc. File System Mini�lter Drivers. https://msdn.microsoft.com/en-us/library/windows/hardware/ff540402%28v=vs.85%29.aspx, 2014.

[79] Microsoft, Inc. Protecting Anti-Malware Services. https://msdn.microsoft.com/en-us/library/windows/desktop/dn313124(v=vs.85).aspx, 2016.

[80] Malte Möser. Anonymity of bitcoin transactions: An analysis of mixing services.In Proceedings of Monster Bitcoin Conference, 2013.

[81] Ms. Smith. Kansas Heart Hospital hit with ransomware; attackers demandtwo ransoms. http://www.networkworld.com/article/3073495/security/

kansas-heart-hospital-hit-with-ransomware-paid-but-attackers-\

\demanded-2nd-ransom.html, 2016.

[82] Terry Nelms, Roberto Perdisci, and Mustaque Ahamad. Execscent: Mining fornew c&c domains in live networks with adaptive control protocol templates. InUSENIX Security, pages 589�604, 2013.

[83] Nick Nikiforakis, Marco Balduzzi, S. Van Acker, W. Joosen, and DavideBalzarotti. Exposing the lack of privacy in �le hosting services. In Proceed-ings of the 4th USENIX conference on Large-scale exploits and emergent threats(LEET), LEET 11. USENIX Association, March 2011.

[84] No-More-Ransomware Project. No More Ransomware! https://www.

nomoreransom.org/about-the-project.html, 2016.

[85] Patrick Traynor Nolen Scaife, Henry Carter and Kevin Butler. CryptoLock(and Drop It): Stopping Ransomware Attacks on User Data. In In IEEE In-ternational Conference on Distributed Computing Systems (ICDCS), 2016.

137

[86] Gavin O'Gorman and Geo� McDonald. Ransomware: A Growing Menance.http://www.symantec.com/connect/blogs/ransomware-growing-menace,2012.

[87] Payload Security Inc,. Payload Security. https://www.hybrid-analysis.com,2016.

[88] Brian Prince. CryptoLocker Could Herald Rise of MoreSophisticated Ransomware. http://www.darkreading.com/

attacks-breaches/cryptolocker-could-herald-rise-of-more-sophis\

\ticated-ransomware, 2013.

[89] QuickBT. Disturbing Bitcoin Virus. http://www.reddit.com/r/Bitcoin/

comments/1o53hl/, October 2013.

[90] Babak Rahbarinia, Roberto Perdisci, and Manos Antonakakis. Segugio: E�-cient behavior-based tracking of malware-control domains in large isp networks.In DSN, pages 403�414. IEEE, 2015.

[91] Ray Smith. Tesseract Open Source OCR Engine . https://github.com/

tesseract-ocr/tesseract, 2015.

[92] REAQTA Inc,. HyraCrypt Ransomware.https://reaqta.com/2016/02/hydracrypt-ransomware/, 2016.

[93] Dorit Ron and Adi Shamir. Quantitative analysis of the full bitcoin transactiongraph. In Financial Cryptography and Data Security 2013, volume 7859, pages6�24, 2013.

[94] Christian Rossow, Christian J Dietrich, Chris Grier, Christian Kreibich, VernPaxson, Norbert Pohlmann, Herbert Bos, and Maarten Van Steen. Prudentpractices for designing malware experiments: Status quo and outlook. In Secu-rity and Privacy (SP), 2012 IEEE Symposium on, pages 65�79. IEEE, 2012.

[95] Matthew G Schultz, Eleazar Eskin, Erez Zadok, and Salvatore J Stolfo. Datamining methods for detection of new malicious executables. In Security andPrivacy, 2001. S&P 2001. Proceedings. 2001 IEEE Symposium on, pages 38�49. IEEE, 2001.

[96] Sophos, Inc. Security Threat Report 2014, Smarter, Shadier, Stealth-ier Malware. http://www.sophos.com/en-us/medialibrary/PDFs/other/

sophos-security-threat-report-2014.pdf, 2014.

[97] Michele Spagnuolo, Federico Maggi, and Stefano Zanero. BitIodine: Extractingintelligence from the bitcoin network. In Financial Cryptography and DataSecurity, Lecture Notes in Computer Science (LNCS). Springer-Verlag, March2014.

138

[98] Elizabeth Stinson and John C Mitchell. Characterizing bots remote control be-havior. In Detection of Intrusions and Malware, and Vulnerability Assessment,pages 89�108. Springer, 2007.

[99] Brett Stone-Gross, Ryan Abman, Richard A. Kemmerer, Christopher Kruegel,Douglas G. Steigerwald, and Giovanni Vigna. The Underground Economy ofFake Antivirus Software. In Proceedings of the Workshop on the Economics ofInformation Security and Privacy, 2013.

[100] Andrew H Sung, Jianyun Xu, Patrick Chavez, and Srinivas Mukkamala. Staticanalyzer of vicious executables (save). In Computer Security Applications Con-ference, 2004. 20th Annual, pages 326�334. IEEE, 2004.

[101] Symantec, Inc. Internet Security Threat Report. http://www.symantec.com/security_response/publications/threatreport.jsp, 2017.

[102] The Cyber Threat Alliance. Lucrative Ransomware Attacks: Analysis of Cryp-towall Version 3 Threat.

[103] TrendLabs. An Onslaught of Online Banking Mal-ware and Ransomware. http://apac.trendmicro.com/

cloud-content/apac/pdfs/security-intelligence/reports/

rpt-cashing-in-on-digital-information.pdf, 2013.

[104] Amit Vasudevan and Ramesh Yerraballi. Cobra: Fine-grained malware anal-ysis using stealth localized-executions. In Security and Privacy, 2006 IEEESymposium on, 2006.

[105] Giovanni Vigna. From Anubis and Wepawet to Llama. http://info.

lastline.com/blog/from-anubis-and-wepawet-to-llama, June 2014.

[106] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Imagequality assessment: from error visibility to structural similarity. Image Process-ing, IEEE Transactions on, 13(4):600�612, 2004.

[107] WIRED Magazine. Why Hospitals Are the Perfect Tar-gets for Ransomware. https://www.wired.com/2016/03/

ransomware-why-hospitals-are-the-perfect-targets/, 2016.

[108] J-Y Xu, Andrew H Sung, Patrick Chavez, and Srinivas Mukkamala. Polymor-phic malicious executable scanner by api sequence analysis. In Hybrid IntelligentSystems, 2004. HIS'04. Fourth International Conference on, pages 378�383.IEEE, 2004.

[109] Heng Yin, Dawn Song, Manuel Egele, Christopher Kruegel, and Engin Kirda.Panorama: capturing system-wide information �ow for malware detection andanalysis. In Proceedings of the 14th ACM conference on Computer and commu-nications security, pages 116�127. ACM, 2007.

139

[110] Adam Young and Moti Yung. Cryptovirology: Extortion-based security threatsand countermeasures. In Security and Privacy, 1996. Proceedings., 1996 IEEESymposium on, pages 129�140. IEEE, 1996.

[111] Adam L. Young. Building a Cryptovirus Using Microsoft's Cryptographic API.In Proceedings of the International Conference on Information Security, pages389�401, 2005.

[112] Jim Yuill, Mike Zappe, Dorothy Denning, and Fred Feer. Honey�les: decep-tive �les for intrusion detection. In Information Assurance Workshop, 2004.Proceedings from the Fifth Annual IEEE SMC, pages 116�122. IEEE, 2004.

140