malware guard extension: using sgx to conceal …at this point anti-virus scanners can hook in to...

Malware Guard Extension:Using SGX to Conceal Cache Attacks

(Extended Version)Michael Schwarz

Graz University of TechnologyEmail: [email protected]

Samuel WeiserGraz University of Technology

Email: [email protected]

Daniel GrussGraz University of Technology


Clementine MauriceGraz University of Technology


Stefan MangardGraz University of Technology


Abstract—In modern computer systems, user processes areisolated from each other by the operating system and thehardware. Additionally, in a cloud scenario it is crucial that thehypervisor isolates tenants from other tenants that are co-locatedon the same physical machine. However, the hypervisor does notprotect tenants against the cloud provider and thus the suppliedoperating system and hardware. Intel SGX provides a mechanismthat addresses this scenario. It aims at protecting user-levelsoftware from attacks from other processes, the operating system,and even physical attackers.

In this paper, we demonstrate fine-grained software-basedside-channel attacks from a malicious SGX enclave targetingco-located enclaves. Our attack is the first malware runningon real SGX hardware, abusing SGX protection features toconceal itself. Furthermore, we demonstrate our attack bothin a native environment and across multiple Docker containers.We perform a Prime+Probe cache side-channel attack on a co-located SGX enclave running an up-to-date RSA implementationthat uses a constant-time multiplication primitive. The attackworks although in SGX enclaves there are no timers, no largepages, no physical addresses, and no shared memory. In a semi-synchronous attack, we extract 96 % of an RSA private keyfrom a single trace. We extract the full RSA private key in anautomated attack from 11 traces within 5 minutes.

I. INTRODUCTION

Modern operating systems isolate user processes from eachother to protect secrets in different processes. Such secretsinclude passwords stored in password managers or privatekeys to access company networks. Leakage of these secretscan compromise both private and corporate systems. Similarproblems arise in the cloud. Therefore, cloud providers usevirtualization as an additional protection using a hypervisor.The hypervisor isolates different tenants that are co-located onthe same physical machine. However, the hypervisor does notprotect tenants against a possibly malicious cloud provider.

Although hypervisors provide functional isolation, side-channel attacks are often not considered. Consequently, re-searchers have demonstrated various side-channel attacks, es-pecially those exploiting the cache [1]. Cache side-channel

attacks can recover cryptographic secrets, such as AES [2],[3] and RSA [4] keys, across virtual machine boundaries.

Intel introduced a new hardware extension SGX (SoftwareGuard Extensions) [5] in their CPUs, starting with the Skylakemicroarchitecture. SGX is an isolation mechanism, aiming atprotecting code and data from modification or disclosure evenif all privileged software is malicious [6]. This protectionuses special execution environments, so-called enclaves, whichwork on memory areas that are isolated from the operating sys-tem by the hardware. The memory area used by the enclavesis encrypted to protect the application’s secrets from hardwareattackers. Typical use cases include password input, passwordmanagers, and cryptographic operations. Intel recommendsstoring cryptographic keys inside enclaves and claims thatside-channel attacks “are thwarted since the memory is pro-tected by hardware encryption” [7].

Apart from protecting software, the hardware-supportedisolation led to fear of super malware inside enclaves.Rutkowska [8] outlined a scenario where a benign-lookingenclave fetches encrypted malware from an external server anddecrypts and executes it within the enlave. In this scenario,it is impossible to debug, reverse engineer, or in any otherway analyze the executed malware. Aumasson et al. [9] andCostan et al. [6] eliminated this fear by arguing that enclavesalways run with user space privileges and can neither issuesyscalls nor perform any I/O operations. Moreover, SGXis a highly restrictive environment for implementing cacheside-channel attacks. Both state-of-the-art malware and side-channel attacks rely on several primitives that are not availablein SGX enclaves. Consequently, no enclave malware has beendemonstrated on real hardware so far.

In this paper, we show that it is very well possible forenclave malware to attack its hosting system. We demonstratea cache attack from within a malicious enclave that is ex-tracting secret keys from co-located enclaves. Our proof-of-concept malware is able to recover RSA keys by monitoringcache access patterns of an RSA signature process in a semi-

arX

iv:1

702.

0871

9v3

[cs

.CR

] 2

2 M

ay 2

019

synchronous attack. The malware code is completely invisibleto the operating system and cannot be analyzed due to theisolation provided by SGX. In an even stronger attack scenario,we show that an additional isolation using Docker containersdoes not protect against this kind of attack.

We make the following contributions:1) We demonstrate that, despite the restrictions of SGX,

cache attacks can be performed from within an enclaveto attack a co-located enclave.

2) By combining DRAM and cache side channels, wepresent a novel approach to recover physical address bitseven if 2MB pages are unavailable.

3) We show that it is possible to have highly accuratetimings within an enclave without access to the nativetimestamp counter, which is even more accurate than thenative one.

4) We demonstrate a fully automated end-to-end attack onthe RSA implementation of the wide-spread mbedTLSlibrary. We extract 96% of an RSA private key froma single trace and the full key from 11 traces within 5minutes.

Section II presents the background required for our work.Section III outlines the threat model and our attack scenario.Section IV describes the measurement methods and the onlinephase of the malware. Section V explains the key recoverytechniques used in the offline phase. Section VI evaluates theattack against an up-to-date RSA implementation. Section VIIdiscusses several countermeasures. Section VIII concludes ourwork.

II. BACKGROUND

A. Intel SGX in Native and Virtualized Environments

Intel Software Guard Extensions (SGX) are a new set of x86instructions introduced with the Skylake microarchitecture.SGX allows protecting the execution of user programs in so-called enclaves. Only the enclave can access its own memoryregion, any other access to it is blocked by the CPU. As SGXenforces this policy in hardware, enclaves do not need to relyon the security of the operating system. In fact, with SGX theoperating system is generally not trusted. By doing sensitivecomputation inside an enclave, one can effectively protectagainst traditional malware, even if such malware has obtainedkernel privileges. Furthermore, it allows running secret codein a cloud environment without trusting the cloud provider’shardware and operating system.

An enclave resides in the virtual memory area of an ordinaryapplication process. When creating an enclave, a virtual mem-ory region is reserved for the enclave. This virtual memoryregion can only be backed by physically protected pagesfrom the so-called Enclave Page Cache (EPC). In SGX, theoperating system is in charge of mapping EPC pages correctly.However, any invalid or malicious page mapping is detectedby the CPU to maintain enclave protection. The EPC itselfis a contiguous physical block of memory in DRAM that istransparently encrypted using a dedicated hardware encryption

module. This protects enclaves against hardware attacks tryingto read or manipulate enclave content in DRAM.

Creation and loading of enclaves are done by the oper-ating system. To protect the integrity of enclave code, theloading procedure is measured by the CPU. If the resultingmeasurement does not match the value specified by the enclavedeveloper, the CPU will refuse to run the enclave. Duringenclave loading, the operating system has full access to theenclave binary. At this point anti-virus scanners can hook into analyze the enclave binary before it is executed. Enclavemalware will attempt to hide from anti-virus scanners byencrypting malicious payload.

Since enclave code is known to the (untrusted) operatingsystem, it cannot carry hard-coded secrets. Any secret informa-tion might only be provisioned to the enclave during runtime.Before giving secrets to an enclave, a provisioning party hasto ensure that the enclave has not been tampered with. SGXtherefore provides remote attestation, which proves correctenclave loading via the aforementioned enclave measurement.

SGX comes in two versions. SGX1 specifies basic enclaveoperation. Moreover, all enclave memory pages have to beallocated at enclave creation. To account for limited memoryresources, enclave pages can be swapped out and in at runtime.SGX2 extends SGX with dynamic memory management,allowing to allocate new enclave pages at runtime. However,we do not use SGX2 features and thus presume that our attackis applicable to SGX2 as well.

At the time of writing, no hypervisor with SGX supportwas available to us. However, Docker [10] has support forIntel’s SGX. Docker is an operating-system-level virtualizationsoftware that allows applications with all their dependenciesto be packed into one container. It has emerged as a standardruntime for containers on Linux and can be used on multiplecloud providers. Unlike virtual machines, Docker containersshare the kernel and other resources with the host system,requiring fewer resources than a virtual machine. Dockerisolates processes from each other but does not give a fullisolation guarantee such as virtual machines. Arnautov et al.[11] proposed to combine Docker containers with SGX tocreate secure containers.

B. Microarchitectural Attacks

Microarchitectural attacks exploit hardware properties thatallow inferring information on other processes running on thesame system. In particular, cache attacks exploit the timingdifference between the CPU cache and the main memory. Theyhave been the most studied microarchitectural attacks for thepast 20 years, and were found to be powerful attacks able toderive cryptographic secrets [12]–[15].

While early attacks focused on the L1 caches, more modernattacks target the last-level cache, which is shared among allCPU cores. Last-level caches (LLC) are usually built as n-wayset-associative caches. They consist of S cache sets and eachcache set consists of n cache ways with a size of 64B. Thephysical address determines to which cache set and byte offseta variable maps. The lowest 6 bits determine the byte offset

Address 0

...Address n

Fig. 1. The access pattern to the eviction set. Addresses 0 to n map to thesame cache set.

within a cache way, the following log2 S bits starting with bit6 determine the cache set. Only the cache way is not derivedfrom the physical address but chosen by the CPU using itscache replacement policy.

Prime+Probe is a cache attack technique that has firstbeen used by Osvik et al. [16]. In a Prime+Probe attack,the attacker constantly primes (i.e., evicts) a cache set andmeasures how long this step took. The amount of time theprime step took is correlated to the number of cache ways inthis cache set that have been replaced by other programs. Thisallows deriving whether or not a victim application performeda specific secret-dependent memory access. Recent work hasshown that this technique can even be used across virtualmachine boundaries [17]–[21].

To prime (i.e., evict) a cache set, the attacker needs naddresses that map to the same cache set (i.e., an evictionset), where n depends on the cache replacement policy and thenumber of ways of the last-level cache. On Intel CPUs beforeIvy Bridge, the cache replacement policy was Least-RecentlyUsed (LRU), and thus it was sufficient to access n addressesfor an n-way cache. However, on newer microarchitectures,the exact cache replacement policy is unknown. To minimizethe amount of time the prime step takes, it is necessary tofind a minimal n combined with a fast access pattern (i.e.,an eviction strategy). Gruss et al. [22] experimentally foundefficient eviction strategies with high eviction rates and a smallnumber of addresses. We use their eviction strategy on ourSkylake test machine throughout the paper. Figure 1 showsthe eviction set access pattern of this eviction strategy.

A more powerful cache attack technique is Flush+Reloadby Yarom and Falkner [23]. For a Flush+Reload attack,attacker and victim need to share memory (i.e., a sharedlibrary or page deduplication). The attacker flushes a sharedmemory line from the cache to then measure the amount oftime it takes to reload the cache line. This reveals whether ornot another program reloaded this exact cache line. AlthoughFlush+Reload attacks have been studied extensively [2], [3],[20], [24]–[29] they are now considered impractical in thecloud as most cloud providers disabled page deduplication andthus disabled the only way to obtain shared memory in thecloud.

Pessl et al. [30] found another attack vector that can yieldan accuracy close to a Flush+Reload attack without requiringshared memory. They attack the DRAM modules that areshared by all virtual machines running on the same host sys-tem. Each DRAM module has a row buffer that holds the mostrecently accessed DRAM row. While accesses to this bufferare fast, accesses to other memory locations in DRAM are

Algorithm 1: Square-and-multiply exponentiationinput : base b, exponent e , modulus noutput: be mod n

X ← 1;for i ← bitlen(e) downto 0 do

X ← multiply(X ,X );if ei = 1 then

X ← multiply(X , b);end

endreturn X ;

much slower. This timing difference can be exploited to obtainfine-grained information across virtual machine boundaries.

C. Side-Channel Attacks on SGX

There have been speculations that SGX could be vulnerableto cache side-channel attacks [6]. In fact, Intel does notconsider side channels as part of the SGX threat model andthus states that SGX does not provide any specific mechanismsto protect against side-channel attacks [31]. However, theyalso explicitly state that SGX features still impair side-channelattacks. Intel recommends using SGX enclaves to protect pass-word managers and cryptographic keys against side channelsand advertises this as a feature of SGX [7]. Indeed, SGXdoes not provide special protection against microarchitecturalattacks, its focus lies on new attack vectors arising froman untrusted operating system. Xu et al. [32] show thatSGX is vulnerable to controlled channel attacks in which amalicious operating system triggers and monitors enclave pagefaults [33]. Both attacks rely on a malicious or compromisedoperating system to break into an enclave.

SGX enclaves generally do not share memory with otherenclaves, the operating system or other processes. Thus,Flush+Reload attacks on SGX enclaves are not possible. Also,DRAM-based attacks cannot be performed from a maliciousoperating system, as the hardware prevents any operating sys-tem accesses to DRAM rows in the EPC. However, enclavescan mount DRAM-based attacks on other enclaves because allenclaves are located in the same physical EPC.

D. Side-Channel Attacks on RSA

RSA is widely used to create asymmetric signatures, and isimplemented by virtually every TLS library, such as OpenSSLor mbedTLS, formerly known as PolarSSL. mbedTLS is usedin many well-known open source projects such as cURL andOpenVPN. The small size of mbedTLS is well suitable for thesize-constrained enclaves of Intel SGX.

RSA essentially involves modular exponentiation with a pri-vate key, where the exponentiation is typically implemented assquare-and-multiply, as outlined in Algorithm 1. The algorithmsequentially scans over all exponent bits. Squaring is donein each step while multiplication is only carried out if the

corresponding exponent bit is set. An unprotected implemen-tation of square-and-multiply is vulnerable to a variety of side-channel attacks, in which an attacker learns the exponent bydistinguishing the square step from the multiplication step [1],[23].

mbedTLS uses a windowed square-and-multiply routine forthe exponentiation. To minimize the memory footprint of thelibrary, the official knowledge base suggests setting the win-dow size to 1 [34]. With a fixed upper enclave memory limitin current microarchitectures, it is reasonable to follow thisrecommendation. However, a window size of 1 is equivalentto the basic square-and-multiply exponentiation, as shown inAlgorithm 1. Liu et al. [19] showed that if an attack on awindow size of 1 is possible, the attack can be extended toarbitrary window sizes.

Earlier versions of mbedTLS were vulnerable to a timingside-channel attack on RSA-CRT [35]. Due to this attack,current versions of mbedTLS implement a constant-time Mont-gomery multiplication for RSA. Additionally, instead of usinga dedicated square routine, the square operation is carried outusing the multiplication routine as illustrated in Algorithm 1.Thus, there is no leakage from a different square and multipli-cation routine as exploited in previous attacks on square-and-multiply algorithms [19], [23], [36], [37]. However, Liu et al.[19] showed that the secret-dependent accesses to the bufferb still leak the exponent.

Boneh et al. [38] and Blomer et al. [39] showed that it isfeasible to recover the full RSA private key if only some ofeither the most significant or least significant bits are known.Halderman et al. [40] showed that it is even possible to recovera full RSA key if up to 12% of random bits are corrupted.Heninger et al. [41] improved these results and recovered afull key for random unidirectional corruptions of up to 46%.

III. THREAT MODEL AND ATTACK SETUP

In this section, we present our threat model. We demonstratea malware that circumvents SGX’s and Docker’s isolationguarantees. We successfully mount a Prime+Probe attackon an RSA signature computation running inside a differentenclave, on the outside world, and across container boundaries.

A. High-Level View of the Attack

In our threat model, both the attacker and the victim arerunning on the same physical machine. The machine can eitherbe a user’s local computer or a host in the cloud. In thecloud scenario, the victim has its enclave running in a Dockercontainer to provide services to other applications running onthe host. Docker containers are well supported on many cloudproviders, e.g., Amazon [42] or Microsoft Azure [43]. As thesecontainers are more lightweight than virtual machines, a hostcan run up to several hundred containers simultaneously. Thus,the attacker has good chances to get a co-located container ona cloud provider.

Figure 2 gives an overview of our native setup. The victimruns a cryptographic computation inside the enclave to protectit against any attacks. The attacker tries to stealthily extract

SGX

Malware(Prime+Probe)

Loader

Attacker

L1/L2 Cache

SGX

RSASignature

+ private key

Public API

Victim

L1/L2 Cache

Shared LLC

Fig. 2. The threat model: both attacker and victim run on the same physicalmachine in different SGX enclaves.

secrets from this victim enclave. Both the attacker and thevictim use Intel’s SGX feature and are therefore subdividedinto two parts, the enclave and loader, i.e., the main programthat instantiates the enclave.

The attack is a multi-step process that can be divided into anonline and an offline phase. Section IV describes the onlinephase, in which the attacker first locates the victim’s cachesets that contain the secret-dependent data of the RSA privatekey. The attacker then monitors the identified cache sets whiletriggering a signature computation. Section V gives a detailedexplanation of the offline phase in which the attacker recoversa private key from collected traces.

B. Victim

The victim is an unprivileged program that uses SGX toprotect an RSA signing application from both software andhardware attackers. Both the RSA implementation and theprivate key reside inside the enclave, as suggested by Intel [7].Thus, they can never be accessed by system software ormalware on the same host. Moreover, information leakagefrom the enclave should not be possible due to hardwareisolation and memory encryption. The victim uses the RSAimplementation of the widely deployed mbedTLS library, thatrelies on constant-time Montgomery multiplications. The vic-tim application provides an API to compute a signature forprovided data.

C. Attacker

The attacker runs an unprivileged program on the same hostmachine as the victim. The goal of the attacker is to stealthilyextract the private key from the victim enclave. Therefore,the attacker uses the API provided by the victim to triggersignature computations.

The attacker targets the exponentiation step of the RSAimplementation. To perform the exponentiation in RSA,mbedTLS uses a windowed square-and-multiply algorithm inMontgomery domain. The window size is fixed to 1, as

suggested by the official knowledge base [34]. If successful,the attack can be extended to arbitrary window sizes [19].

To prevent information leakage from function calls,mbedTLS uses the same function (mpi_montmul) for boththe square and the multiply operation (see Algorithm 1).The mpi_montmul takes two parameters that are multipliedtogether. For the square operation, the function is calledwith the current buffer as both arguments. For the multiplyoperation, the current buffer is multiplied with a buffer holdingthe multiplier. This buffer is allocated in the calling functionmbedtls_mpi_exp_mod using calloc. Due to the de-terministic behavior of the tlibc’s calloc implementation,the used buffers always have the same virtual and physicaladdresses. Thus the buffers are always in the same cache sets.The attacker can therefore mount a Prime+Probe attack onthe cache sets containing the buffer.

In order to remain stealthy, all parts of the malware thatcontain attack code reside inside an SGX enclave. The enclavecan protect the encrypted real attack code by only decryptingit after a successful remote attestation after which the enclavereceives the decryption key. As pages in SGX can be mappedas writable and executable, self-modifying code is possible andtherefore code can be encrypted. Consequently, the attack iscompletely stealthy and invisible from anti-virus software andeven from monitoring software running in ring 0. Note that ourproof-of-concept implementation does not encrypt the attackcode as this has no impact on the attack.

The loader does not contain any suspicious code or data,it is only required to start the enclave. The exfiltrated datafrom inside the malicious enclave will only be handed to theloader in an encrypted form. The loader may also provide aTLS endpoint through which the enclave can send encrypteddata to an attacker’s server.

D. Operating System and Hardware

Previous work was mostly focused on attacks on enclavesfrom untrusted cloud operating systems [6], [9], [44]–[46].However, in our attack we do not make any assumptions onthe underlying operating system, i.e., we do not rely on amalicious operating system. Both the attacker and the victimare unprivileged user space applications. Our attack works on afully-patched recent operating system with no known softwarevulnerabilities, i.e., the attacker cannot elevate its privileges.

Our only assumption on the hardware is that attacker andvictim run on the same host system. This is the case on bothpersonal computers as well as on co-located Docker instancesin the cloud. As SGX is currently only available on Intel’sSkylake microarchitecture, it is valid to assume that the hostis a Skylake system. Consequently, we know that the last-levelcache is shared between all CPU cores.

E. Malware Detection

We expect the cloud provider to run state-of-the-art mal-ware detection software. We assume that malware detectionsoftware is able to monitor the behavior of containers or eveninspect the content of containers. Moreover, the user can run

anti-virus software and monitor programs inside the container.This software can either protect the data from infections or theinfrastructure from attacks.

Standard malware detection methods are either signature-based, behavioral-based or heuristics-based [47]. Signature-based detection is used by virus scanners to match bytesequence insides executables against a list of such sequencesextracted from known malware. This method is fast andrarely causes false-positives, but can only detect known mal-ware [48]. In addition to signature-based detection, modernvirus scanners implement behavior-based analysis. Behavior-based analysis has the potential to detect new malware by mon-itoring system activity, API calls, and user interactions [48].

We also assume the presence of detection mechanisms usingperformance counters, to detect malware [49] and microarchi-tectural attacks [50], which are more targeted to our attack.

IV. EXTRACTING PRIVATE KEY INFORMATION

In this section, we describe the online phase of our attack.We first build primitives necessary to mount this attack. Thenwe show in two steps how to locate and monitor cache setsto extract private key information.

A. Attack Primitives in SGX

Successful Prime+Probe attacks require two primitives: ahigh-resolution timer to distinguish cache hits and misses anda method to generate an eviction set for an arbitrary cacheset. Due to the restrictions of SGX enclaves, we cannot relyon existing Prime+Probe implementations, and therefore werequire new techniques to build a malware from within anenclave.

High-resolution Timer. The unprivileged rdtsc andrdtscp instructions, which read the timestamp counter, areusually used for fine-grained timing outside enclaves. InSGX1, these instructions are not permitted inside an SGXenclave, as they might cause a VM exit [51]. Therefore, wehave to rely on a different timing source.

Lipp et al. [26] demonstrated a counting thread as a high-resolution alternative on ARM where no unprivileged high-resolution timer is available. The idea is to have a dedicatedthread incrementing a global variable in an endless loop. Asthe attacks only rely on accurate timing differences and noton absolute timestamps, this global variable serves directly asthe timing source.

We require a minimum resolution in the order of 10 cycles toreliably distinguish cache hits from misses as well as DRAMrow hits from row conflicts. To achieve the highest numberof increments, we handcraft the counter increment in inlineassembly. According to Intel [52], the fastest instructions onthe Skylake microarchitecture are inc and add with both alatency of 1 cycle and a throughput of 0.25 cycles/instructionwhen executed with a register as an operand. The countervariable has to be accessible across threads, thus it is necessaryto store the counter variable in memory. Memory addresses asoperands incur an additional cost of approximately 4 cyclesdue to L1 cache access times [52]. To reduce the cost of

1 mov &c o u n t e r , %rcx2 1 : i n c l (%rcx )3 jmp 1b

Listing 1: A counting thread that emulates rdtsc.

the jmp instruction, we tried to unroll the loop up to thepoint where we get the most increments per CPU cycle.However, our experiments showed that the unrolling tendsto rather have negative effects on the performance. On ourtest machine, the code from Listing 1 achieves one incrementevery 4.7 cycles, which is an improvement of approximately2% over the assembly code generated by gcc on the highestoptimization level (-O3).

We can improve the performance—and thus theresolution—further, by exploiting the fact that only thecounting thread is writing to the counter variable. Reading thecounter variable from memory is therefore never necessaryas the value will not be changed by any other thread. Togain a higher performance from this observation, we haveto eliminate the CPU’s read access to the counter variable.Executing arithmetic operations directly on the memorylocation is thus not an option anymore, and it is necessaryto perform any operation with data dependency on a CPUregister. Therefore, we introduce a “shadow counter variable”which is always held in a CPU register. The arithmeticoperation (either add or inc) is performed using this registeras the operand, unleashing the low latency and throughputof these instructions. As registers cannot be shared acrossthreads, the shadow counter has to be moved to memoryusing the mov instruction after each increment. Similar to theinc and add instruction, the mov instruction has a latencyof 1 cycle and a throughput of 0.5 cycles/instruction whencopying a register to a memory location. Listing 2 shows theimproved counting thread. This counting thread is capable ofincrementing the variable by one every 0.87 cycles, which isan improvement of 440% over the code in Listing 1. In fact,this version is even 15% faster than the native timestampcounter, thus giving us a reliable timing source that evenhas a higher resolution. This new method might open newpossibilities of side-channel attacks that leak informationthrough timing on a sub-rdtsc level. Figure 3 shows theperformance comparison of the C version, the assemblyversion, the optimized assembly version, and the nativetimestamp counter as a baseline. Although the method withthe shadow counter has the most instructions in the loop body,and an increase of 100% in code size compared to Listing 1,it has the best performance. Due to multiple execution units,pipelining, and the absence of memory dependencies, oneincrement can be carried out in less than 1 cycle on theSkylake microarchitecture even though each instruction has alatency of 1 cycle [53].

Eviction Set Generation. Prime+Probe relies on evictionsets, i.e., we need to find virtual addresses that map to the samecache set. An unprivileged process cannot translate virtual

1 mov &c o u n t e r , %rcx2 1 : i n c %rax3 mov %rax , (%rcx )4 jmp 1b

Listing 2: The improved fast counting thread that acts as theemulation of rdtsc.

C asm optimized asm rdtsc

1

2

3

4

5 4.76 4.7

0.87 11

cycl

es/in

crem

ent

Fig. 3. Comparisons of different variants for the counter thread to Intel’snative timestamp counter as the baseline.

to physical addresses and therefore cannot simply search forvirtual addresses that fall into the same cache set. This limita-tion also applies to enclaves, as they are always unprivileged.Liu et al. [19] and Maurice et al. [21] demonstrated algorithmsto build eviction sets using large pages by exploiting the factthat the virtual address and the physical address have the samelowest 21 bits. At least in the current version, SGX doesnot support large pages, making this approach inapplicable.Oren et al. [54] and Gruss et al. [22] demonstrated fullyautomated methods to generate an eviction sets for a givenvirtual address. However, the method of Oren et al. [54]uses a slow pointer-chasing approach and needs to find aneviction set without any assumptions, consuming more time.The method by Gruss et al. [22] has the overhead of findingan eviction strategy and eviction set without any assumptions.Thus, applying their approach for our purposes would consumemultiple hours on average before even starting the actualPrime+Probe attack.

We propose a new method to recover the cache setfrom a virtual address without relying on large pages. Themethod requires that an array within an SGX enclave isbacked by physically contiguous pages. We verified thatwe have contiguous pages by inspecting Intel’s SGX driverfor Linux [55]. When initializing a new enclave, the func-tion isgx_page_cache_init creates a list of avail-able physical pages for the enclave. These pages start ata base physical address and are contiguous. If a physicalpage is mapped, e.g., due to a page fault, the functionisgx_alloc_epc_page_fast removes and returns thehead of the list.

The idea is to exploit the DRAM timing differences that aredue to the DRAM organization and to use the DRAM mappingfunctions [30] to recover physical address bits. Alternatelyaccessing two virtual addresses that map to the same DRAMbank but a different row is significantly slower than any other

TABLE IREVERSE-ENGINEERED DRAM MAPPING FUNCTIONS USING THE

METHOD FROM PESSL ET AL. [30].

Address Bit2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 02 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6

2 DIMMs

Channel ⊕ ⊕ ⊕ ⊕ ⊕ ⊕BG0 ⊕ ⊕BG1 ⊕ ⊕BA0 ⊕ ⊕BA1 ⊕ ⊕Rank ⊕ ⊕

combination of virtual addresses. For the first address of aDRAM row, the least-significant 18 bits of the physical ad-dress are ‘0’, because the row index only uses physical addressbits 18 and upwards. Thus, we scan memory sequentially foran address pair in physical proximity that causes a row conflict.As SGX enclave memory is allocated in a contiguous way wecan perform this scan on virtual addresses.

A virtual address pair that causes row conflicts at thebeginning of a row satisfies the following constraints:

1) The bank address (BA), bank group (BG), rank, andchannel must be the same for both virtual addresses.Otherwise, a row conflict is not possible.

2) The row index must be different for both addresses.3) The difference of the two physical addresses (of the

virtual addresses) has to be at least 64B (the size of onecache line) but should not exceed 4 kB (the size of onepage).

4) Physical address bits 6 to 22 have the same known value,all 0 for the higher address and all 1 for the loweraddress, as only bits in this range are used by the mappingfunction.

For all virtual addresses satisfying these constraints, bits 6 to22 have a known value. Thus, we know the exact cache setfor these virtual addresses.

Table I shows the reverse-engineered DRAM mapping func-tion for our test machine, an Intel Core i5-6200U with 12GBmain memory. The row index is determined by the physicaladdress bits starting from bit 18.

To find address pairs fulfilling the aforementioned con-straints, we modeled the mapping function and the constraintsas an SMT problem and used the Z3 theorem prover [56]to provide models satisfying the constraints. The model wefound yields pairs of physical addresses where the upperaddress is 64B apart from the lower one. There are four suchaddress pairs within every 4MB block of physical memorysuch that each pair maps to the same bank but a differentrow. The least-significant bits of the physical address pairs areeither (0x3fffc0, 0x400000), (0x7fffc0, 0x800000),(0xbfffc0, 0xc00000) or (0xffffc0, 0x1000000) forthe lower and higher address respectively. Thus, at least 22 bitsof the higher addresses least-significant bits are 0.

Figure 4 shows the average access time for address pairswhen iterating over a 2MB array. The highest two peaks showrow conflicts, i.e., the row index changes while the bank, rank,

0 500 1,000 1,500 2,000

400

450

500

550

600

650

Array index [kB]

Acc

ess

time

[CPU

cycl

es]

Fig. 4. Access times when alternately accessing two addresses which are64B apart. The (marked) high access times indicate row conflicts.

and channel stay the same. As the cache set is determined bythe bits 6 to 17, the higher address has the cache set index 0 atthese peaks. Based on the assumption of contiguous memory,we can generate addresses mapping to the same cache set byadding multiples of 256KB to the higher address.

As the last-level cache is divided into multiple parts calledcache slices, there is one cache set per slice for each cache setindex. Thus, we will inherently add addresses to our generatedeviction set that have no influence on the eviction althoughthey have the correct cache set index. For the eviction set, itis necessary to only use addresses that map to the same cacheslice. However, to calculate the cache slice from a physicaladdress, all bits of the physical address are required [57].

As we are not able to directly calculate the cache slice, weuse another approach. We add our calculated addresses fromthe correct cache set to our eviction set until the eviction rateis sufficiently high. Then, we try to remove single addressesfrom the eviction set as long as the eviction rate does not drop.Thus, we remove all addresses that do not contribute to theeviction, and the result is a minimal eviction set. Algorithm 2shows the full algorithm to generate a minimal eviction set.Our approach takes on average 2 seconds per cache set, aswe already know that our addresses map to the correct cacheset. This is nearly three orders of magnitude faster than theapproach of Gruss et al. [22].

B. Identifying Vulnerable Sets

Now that we have a reliable high-resolution timer and amethod to generate eviction sets, we can mount the firststage of the attack and identify the vulnerable cache sets. Aswe do not have any information on the virtual or physicaladdresses of the victim, we have to scan the last-level cache forcharacteristic patterns that correspond to the signature process.We consecutively mount a Prime+Probe attack on every cacheset while the victim is executing the exponentiation step. Thisallows us to log cache misses due to a victim’s activity insidethe monitored cache set.

Algorithm 2: Generating the eviction setinput : memory : char[8× 1024× 1024], set : intoutput: eviction set : char*[n]

border ← 0;border index ← 0;for i← 0xFC0 to 4× 1024× 1024 step 4096 do

time ← hammer(memory [i ], memory [i + 64 ]);if time > border then

border ← time;border index ← i+ 64;

endendaddr ← (&memory [border index ]) + set � 6;n← 0;repeat

full set [n]← addr + n× 256KB;eviction ← evict(full set , n);n← n+ 1;

until eviction > 99%;for i← 0 to n do

removed ← full set [i];full set [i]← NULL;if evict(full set , n) < 99% thenfull set [i]← removed ;

endlen← 0;for i← 0 to n do

if full set [i] 6= NULL theneviction set [len]← full set [i];len ← len + 1;

endend

0 100 200 300 400 500 600 7000

2

4

6

8

·106

Access time [CPU cycles]

Num

ber

ofca

ses

no cache activity cache activity

Fig. 5. Histogram showing the runtime of the prime step for cache activityin the same set and no cache activity in the same set.

First, we fill the cache lines of this cache set with theeviction set using the access pattern shown in Algorithm 3.This step is called the prime step. We expect our addresses tostay in the cache if the victim has no activity in this specificcache set. Second, we measure the runtime of this algorithm

to infer information about the victim. We refer to this step asthe probe step. Figure 5 shows the timings of a probe stepwith and without cache activity of the victim. If there wasno activity of the victim inside the cache set, the probe stepis fast as all addresses of the eviction set are still cached. Ifwe encounter a high timing, we know that there was activityinside the cache set and at least one of our addresses wasevicted from the cache set. For all following measurements,the probe step also acts as the prime step. The measurementensures that the eviction set is cached again for the next round.

We can identify multiple cache sets showing this distinctivepattern which consists of three parts. The start of an exponen-tiation is characterized by a high usage of the cache set due toclearing and initialization of the used buffers. It is followed bythe actual exponentiation that depends on the secret exponent.The exponentiation ends with another high peak where thebuffer is cleared, followed by no cache misses anymore, i.e.,it is only influenced by background noise.

To automatically find these sets, we apply a simple peakdetection to find the rightmost peak. If we can identify anotherpeak before that within a certain range, we assume that thiscache set is used by our target buffer. Depending on the sizeof the RSA exponent, we get multiple cache sets matchingthis pattern. Our experiments showed that using identifiedsets which are neither at the beginning nor at the end yieldsgood results in the actual attack. The first and last cache setmight be used by neighboring buffers and they are more likelyto be prefetched [23], [24]. Thus, they are more prone tomeasurement errors.

C. Monitoring Vulnerable Sets

Once we have identified a cache set which is used by theexponentiation, we can collect the actual traces. The measure-ment method is the same as for detecting the vulnerable cachesets, i.e., we again use Prime+Probe. Due to the deterministicbehavior of the heap allocation, the address of the attackedbuffer does not change on consecutive exponentiations. Thus,we can collect multiple traces of the signature process.

To maintain a high sampling rate, we keep the post-processing during the measurements to a minimum. Moreover,it is important to keep the memory activity at a minimum tonot introduce additional noise on the cache. Thus, we only savethe timestamps of the cache misses for further post-processing.

Figure 6 shows the measurement for one run of the signaturealgorithm. We can see intervals with multiple cache misses andintervals without cache misses, corresponding to high cacheusage and no cache usage of the victim, respectively. As acache miss takes longer than a cache hit, the effective samplingrate varies depending on the number of cache misses. We haveto consider this effect in the post-processing as it induces anon-constant sampling interval.

V. RECOVERING THE PRIVATE KEY

In this section, we describe the offline phase of our attack:recovering the private key from the recorded traces of the

0 200 400 600200

400

600

Sample

Acc

ess

time

[cyc

les]

Fig. 6. Dense areas indicate a high cache-hit rate, white areas are intervalswith cache misses.

Algorithm 3: Attacker accessing a set.input: n: int,

addrs: int[n]

for i← 0 to n − 2 do*addrs[i];*addrs[i+1];*addrs[i+2];*addrs[i];*addrs[i+1];*addrs[i+2];

end

victim enclave. This can either be done inside the malware’senclave or on the attacker’s server.

Ideally, one would combine multiple traces by aligningthem and averaging out noise. The more traces are combined,the more noise is eliminated. From the resulting averagedtrace one can easily extract the private key. However, thetraces obtained in our attack are affected by several noisesources. Most of them alter the timing, making trace alignmentdifficult. Among them are interrupts which lead to contextswitches and therefore descheduling of the attacker or thevictim. Other sources of noise include unrelated activity onthe enclave’s cache sets and varying CPU clock frequency dueto power management. Although methods exist for aligningsuch traces [58], [59], we opt for a different strategy. Insteadof attempting to align traces beforehand, we pre-process alltraces individually and extract a partial key out of each trace.These partial keys likely suffer from random insertion anddeletion errors as well as from bit flips. To eliminate thoseerrors, multiple partial keys are combined in the key recoveryphase. This approach has much lower computational overheadthan trace alignment since key recovery is performed on partialkeys of length 4KB instead of full traces containing severalthousand measurements.

Key recovery comes in three steps. First, traces are pre-processed. Second, a partial key is extracted from each trace.Third, the partial keys are merged to recover the private key.

A. Pre-processing

In the pre-processing step we filter and resample rawmeasurement data. Figure 7 shows a trace segment before(top) and after pre-processing (bottom). High values in the

raw measurement data correspond to cache misses whereaslow values indicate cache hits. Timing measurements have avarying sample rate. This is because a cache miss delays thenext measurement while cache hits allow more frequent mea-surements. To simplify the subsequent steps, we convert themeasurements to a constant sampling rate. Therefore, we spec-ify sampling points 1000 cycles apart. At each sampling pointwe compute the normalized sum of squared measurementswithin a 10 000 cycle window. Squaring the measurements isnecessary to account for the varying sampling rate. If themeasurements exceed a certain threshold, they are consideredas noise and are discarded. If too few measurements areavailable within a window, e.g., due to an interrupt, we applylinear interpolation. The resulting resampled trace shows highpeaks at locations of cache misses, indicating a ‘1’ in the RSAexponent, as shown in Figure 7 on the bottom.

B. Partial Key Extraction

To automatically extract a partial key from a resampledtrace, we first run a peak detection algorithm. We deleteduplicate peaks, e.g., peaks where the corresponding RSAmultiplications would overlap in time. We also delete peaksthat are below a certain adaptive threshold, as they do notcorrespond to actual multiplications. Using an adaptive thresh-old is necessary since neither the CPU frequency nor ourtiming source (the counting thread) is perfectly stable. Thevarying peak height is shown in the right third of Figure 7.The adaptive threshold is the median over the 10 previouslydetected peaks. If a peak drops below 90% of this threshold,it is discarded. The remaining peaks correspond to the ‘1’s inthe RSA exponent and are highlighted in Figure 7. ‘0’s canonly be observed indirectly in our trace as square operationsdo not trigger cache activity on the monitored sets. ‘0’s appearas time gaps in the sequence of ‘1’ peaks, thus revealing allpartial key bits. Note that since ‘0’s correspond to just onemultiplication, they are roughly twice as fast as ‘1’s.

A partial key might suffer from bit flips, random insertions,and deletions, when compared to the correct key. When acorrect peak is falsely discarded, the corresponding ‘1’ isinterpreted as two ‘0’s. Likewise, if noise is falsely interpretedas a ‘1’, this cancels out two ‘0’s. Moreover, if the attackeris not scheduled, we miss certain key bits in the trace. If thevictim is not scheduled, we see a region of cache inactivity inthe measurement that cannot be distinguished from true ‘0’s.Finally, if both the attacker and the victim are descheduled,this gap does not show up prominently in the trace since thecounting thread is also suspended by the interrupt. This is infact an advantage of a counting thread over the use of thenative timestamp counter. The remaining errors in the partialkeys are corrected in the final key recovery.

C. Final Key Recovery

In the final key recovery, we merge multiple partial keys toobtain the full key. We quantify partial key errors using theedit distance [60]. The edit distance between a partial key andthe correct key gives the number of bit insertions, deletions

1 1 1 00 1 1 1 0 1 1 1 0000000 1 000 1 0 1 00 1 1 00 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 000 1 00 1 1 1 0 1 000 1 1 1 0000 1 1 1

Fig. 7. On top is one trace’s raw measurement over 4 000 000 cycles. The peaks in the resampled trace on the bottom clearly indicate ‘1’s.

TABLE IIBIT-WISE KEY RECOVERY OVER FIVE PARTIAL KEYS.

No. Recovered key1 10111110001100110010111101010000100...2 10111011000111001100101101101010000...3 10111110001110011001011110101000010...4 10111110001110001100101111010100001...5 10111110001110011001011100010100001...

and flips necessary to transform the partial key into the correctkey.

Algorithm 4 shows the pseudo code for the final keyrecovery. The full key is recovered bitwise, starting from themost-significant bit. The correct key bit is the result of themajority vote over the corresponding bit in all partial keys.Before proceeding to the next key bit, we correct all wrongpartial keys which did not match the recovered key bit. Tocorrect the current bit of the wrong partial key, we compute theedit distance to all partial keys that won the majority vote. Toreduce performance overhead, we calculate the edit distance,not over the whole partial keys but only over a lookaheadwindow of a few bits. The output of the edit distance algorithmis a list of actions necessary to transform one key into theother. We apply these actions via majority vote until the key bitof the wrong partial key matches the recovered key bit again.Table II gives an example where the topmost 5 bits are alreadyrecovered (underlined). The sixth key bit is recovered as ‘1’,since all partial key bits—except for the second one—are ‘1’(bold). The incorrect ‘0’ of the second partial key is deletedbefore proceeding to the next bit. This procedure is repeatedfor all key bits until the majority of partial keys reached thelast bit.

VI. EVALUATION

In this section, we evaluate the presented methods bybuilding a malware enclave attacking a co-located enclavethat acts as the victim. As discussed in Section III-B, weuse mbedTLS, in version 2.3.0. The small code and memoryfootprint and self-containment of mbedTLS makes it easy touse in SGX enclaves.

Cache Set Detection (3min)

Prime+Probe (5 s)

Pre-Processing (110 s)

Key Recovery (20 s)

Fig. 8. A high-level overview of the average times for each step of the attack.

A. RSA Key Sizes and Exploitation

For the evaluation, we attack a 4096-bit RSA key as thisprovides long-term security, based on the recommendation ofNIST [61]. Higher bit sizes are rarely used outside tinfoil-hatenvironments.

Table III shows various RSA key sizes and the correspond-ing buffer sizes in mbedTLS. The runtime of the multiplicationfunction increases exponentially with the size of the key.Hence, larger keys improve the measurement resolution of theattacker. In terms of cache side-channel attacks, large RSAkeys do not provide higher security but degrade side-channelresistance [23], [62]–[64].

B. Native Environment

We use a Lenovo ThinkPad T460s running Ubuntu 16.10.This computer supports SGX1 using Intel’s SGX driver. Thehardware details for the evaluation are shown in Table IV.Both the attacker enclave and the victim enclave are runningon the same machine. We trigger the signature process usingthe public API of the victim’s enclave.

Figure 8 gives an overview of how long the individual stepsof an average attack take. The runtime of automatic cacheset detection varies depending on which cache sets are usedby the victim. The attacked buffer spans 9 cache sets, out ofwhich 6 show low bit-error rate, as shown in Figure 9. Forthe attack we select one of the 6 sets, as the other 3 sufferfrom too much noise. The noise is mainly due to the buffer notbeing aligned to the cache set. Furthermore, as already knownfrom previous attacks, the hardware prefetcher can induce asignificant amount of noise [23], [24].

Detecting one vulnerable cache set within all 2048 cachesets requires about 340 trials on average. With a monitoring

Algorithm 4: RSA private key recovery.input : keys: boolean[], lookahead : intoutput: key : boolean[]

key ← [];i ← 0;while True do

keybit ← majority(keys , i );if keybit =⊥ then

return key ;endkey [i ]← keybit ;correct ← {};wrong ← {};foreach k in keys do

if k [i ] = keybit thencorrect ← correct ∪ k ;

elsewrong ← wrong ∪ k ;

endendforeach kw in wrong do

actions ← {};foreach kc in correct do

actions ← actions∪EditDistance(kw [i : i + lookahead ],

kc[i : i + lookahead ]);endai ← 0;while kw [i ] 6= keybit do

action ← majority(actions , ai );apply action to kw [i ];ai++;

endendi++;

end

function majority(set , idx ) begincounter []← 0;foreach array in set do

element ← array [idx ];increment counter [element ];

endreturn element with max. counter ;

end

time of 0.21 s per cache set, we require a maximum of 72 sto eventually capture a trace from a vulnerable cache set.Thus, based on our experiments, we estimate that cache setdetection—if successful—always takes less than 3min.

One trace spans 220.47 million CPU cycles on average.Typically, ‘0’ and ‘1’ bits are uniformly distributed in thekey. The estimated number of multiplications is thereforehalf the bit size of the key. Thus, the average multiplicationtakes 107 662 cycles. This differs from the values shown inTable III because the attacker constantly evicts the victim’s

TABLE IIIRSA KEY SIZES AND THE CORRESPONDING CPU CYCLES TO EXECUTE

ONE MULTIPLICATION.

Key size Buffer size Cache sets CPU cycles

1024b 136B 3 17642048b 264B 5 66244096b 520B 9 25 4628192b 1032B 17 100 440

TABLE IVEXPERIMENTAL SETUP.

Environment CPU model Cores LLC associativity

Native Core i5-6200U 2 12Docker Core i5-6200U 2 12

buffer, inherently causing a slowdown. In addition, one couldartificially slow down a victim through constant eviction toimprove the performance of cache attacks. This is known asperformance degradation [65]. However, as the Prime+Probemeasurement takes on average 734 cycles, we do not have toartificially slow down the victim and thus remain stealthy.

When looking at a single trace, we can already recoverabout 96% of the RSA private key, as shown in Figure 9.For a full key recovery we combine multiple traces usingour key recovery algorithm, as explained in Section V-C. Wefirst determine a reasonable lookahead window size. Figure 10shows the performance of our key recovery algorithm forvarying lookahead window sizes on 7 traces. For lookaheadwindows smaller than 20, bit errors are pretty high. In thatcase, the lookahead window is too small to account for allinsertion and deletion errors, causing relative shifts betweenthe partial keys. The key recovery algorithm is unable to alignpartial keys correctly and incurs many wrong “correction”steps, increasing the overall runtime as compared to a windowsize of 20. While a lookahead window size of 20 alreadyshows a good performance, a window size of 30 or more doesnot significantly reduce the bit errors. Therefore, we fixed thelookahead window size to 20.

To remove the remaining bit errors and get full key recovery,we have to combine more traces. Figure 11 shows how thenumber of traces affects the key recovery performance. Wecan recover the full RSA private key without any bit errors bycombining only 11 traces within just 18.5 sec. This results ina total runtime of less than 130 sec for the offline key recoveryprocess.

Generalization. Based on our experiments we can deducethat the same attacks are also possible in a weaker scenario,where only the attacker is inside the enclave. On most comput-ers, applications handling cryptographic keys are not protectedby SGX enclaves. From the attacker’s point of view, attackingsuch an unprotected application does not differ from attackingan enclave. We only rely on the last-level cache, which isshared among all applications, independently of whether theyrun inside an enclave or not. We empirically verified that such

1 2 3 4 5 6 7 8 90

10

20

30

4033.68

29.87 29.83

6.96

4.19 3.756.1 5.36

4.29

Bit-

erro

rra

tio[%

]

Fig. 9. The 9 cache sets that are used by a 4096b key and their error ratewhen recovering the key from a single trace.

10 15 20 25 30 35 40 45 500

10

20

30

40 Runtime [s]39

31

4 3 2 2 2 2 2

6.7

10.3 9.711.9

15

19.6

25.6

31.2

37.6

Lookahead window size

Bit-

erro

rs

Fig. 10. Up to a lookahead window size of 30, an increased window sizereduces the number of bit errors while increasing recovery runtime. Themeasurement is conducted with 7 traces.

attacks on the outside world are possible and were again ableto recover RSA private keys.

C. Virtualized Environment

We now show that the attack also works in a virtualizedenvironment.

As described in Section II-A, no hypervisor with SGXsupport was available at the time of our experiments. Insteadof full virtualization using a virtual machine, we used thelightweight Docker containers. Docker containers are also usedby large cloud providers, e.g., Amazon [42] or MicrosoftAzure [43]. To enable SGX within a container, the hostoperating system has to provide SGX support. The SGX driveris then simply shared among all containers. Figure 12 showsour setup where the SGX enclaves communicate directly withthe SGX driver of the host operating system. Applicationsrunning inside the container do not experience any differenceto running on a native system. They can use any functionalityprovided by the host operating system. Consequently, theunmodified malware also works inside containers.

Considering the performance within Docker, only I/O oper-ations and network access have a measurable overhead [66].Operations that only depend on memory and CPU do not

3 5 7 9 110

20

40

60

80

Runtime [s]

69

15

41 0

4.16.3

9.7

13.3

18.5

Traces

Bit-

erro

rs

Fig. 11. With the number of captured traces, the number of bit errors decreasewhile the runtime to recover the key increases.

SGX

Malware(Prime+Probe)

Loader

Attacker container

SGX

RSA+ private key

API

Victim container

Linux kernel

SGX driver

Docker engine

Fig. 12. Running the SGX enclaves inside Docker containers to providefurther isolation. The host provides both containers access to the same SGXdriver.

see any performance penalty, as these operations are notvirtualized. Thus, caches are also not affected by the container.

We were successfully able to attack a victim from within aDocker container without any changes in the malware. We caneven perform a cross-container attack, i.e., both the malwareand the victim are running inside different containers, withoutany changes. As expected, we require the same number oftraces for a full key recovery. These results confirm thatcontainers do not provide additional protection against ourmalware at all.

Furthermore, we can speculate whether our malware wouldalso work within virtual machines based on the experimentalKVM support description [67]. Many cross-VM cache attackshave been demonstrated in the past years [2], [4], [19], asthe CPU cache is a shared resource in virtual machines. Thisdoes not change with SGX, and thus, enclaves inside virtualmachines will also share the last-level cache. The experimentalimplementation for KVM relies on the host system’s SGXdriver to provide memory pages to the enclave inside thevirtual machine. We thus expect that our malware will work

across virtual machines either with only minor changes, oreven without any adaptations.

VII. COUNTERMEASURES

In this section, we discuss advantages and disadvantages ofdifferent countermeasures. Previously presented countermea-sures mostly cannot be applied to a scenario where a maliciousenclave performs a cache attack and no assumptions about theoperating system are made. We group countermeasures into 3categories, based on whether they require:

1) a modification of the enclave (source level),2) a modification of the operating system (OS level) assum-

ing the operating system is benign,3) a change in hardware (hardware level).

A. Source Level

Exponent Blinding. A generic side-channel protection forRSA is exponent blinding [12]. To sign a message m, thesigner generates a random blinding value k for each signa-ture. The signer then calculates the signature as md+k·φ(N)

mod N where d is the private key and N is the RSA modulus.An attacker will only be able to measure the blinded

exponent on every execution. When a single-trace key recoveryis not possible, the attacker has to wait for collisions, i.e.,signatures where the same blinding was used. For a sufficientlylarge blinding factor k, e.g., 64 bit, this becomes infeasible inpractice. As the exponent grows with the blinding factor, thissolution is a trade-off between performance and side-channelresistance. This has no effect if key recovery from a singletrace is possible, only if more than one trace is required.Furthermore, this countermeasure relies on the presence ofa random number source.

Exponent blinding is specific to certain cryptographic op-erations, such as RSA signature computations. It will preventthe proposed attack, but other parts of the signature processmight still be vulnerable to an attack [68].

Bit Slicing. Bit slicing is a technique originally proposedby Biham [69] to improve the performance of DES. Mat-sui [70] was the first to show a bit-sliced implementation ofAES. Sudhakar et al. [71] presented a bit-sliced Montgomerymultiplication for RSA and ECC. The main idea of bit slicingis to use only bit operations for computations throughout thealgorithm. No lookup tables or branches are used in thesealgorithms and thus, they are not vulnerable to cache attacks.

Again, this countermeasure is specific to certain crypto-graphic algorithms. It requires the support of the used cryp-tography library and hardware support for streaming SIMD(SSE) instructions is necessary to achieve a reasonable perfor-mance [72]. Bit slicing can be a good software solution whilethere is no hardware countermeasure. Other countermeasuresfor cryptographic implementations have been discussed byGe et al. [1].

B. Operating System Level

Implementing countermeasures against malicious enclaveattacks on the operating system level requires trusting the

operating system. This would weaken the trust model of SGXenclaves significantly and is thus unrealistic. However, wewant to discuss the different possibilities, in order to providevaluable information for the design process of future enclavesystems.

Eliminating Timers. Removing access to high-resolutiontimers [15], [73] or decreasing the accuracy [74]–[76] isoften discussed as a countermeasure against cache attacks.However, our results using the timing counter show thatremoving precise timers is not a viable countermeasure, as weare still able to mount a high-resolution Prime+Probe attack.Moreover, on recent microarchitectures, we can even get ahigher resolution using our timing thread than with the nativehigh-resolution timestamp counter.

However, it is possible to remove access to high-resolutiontimers and all forms of simultaneous multithreading to preventthis alternative approach. This would effectively eliminate ac-cess to sufficiently accurate timers and mitigate many attacks.

Detecting Malware. One of the core ideas of SGX isto remove the cloud provider from the root of trust. Ifthe enclave is encrypted and only decrypted after successfulremote attestation, the cloud provider has no way to access thesecret code inside the enclave. However, eliminating this corefeature of SGX could mitigate malicious enclaves in practiceas the enclave binary or source code could be read by thecloud provider and scanned for malicious activities.

Heuristic methods, such as behavior-based detection, are notapplicable, as the malicious enclave does not rely on API callsor user interaction. Furthermore, for encrypted enclave code,a signature-based virus scanner has no access to the code,and the malware can easily change its signature by either re-encryption or modification of the plaintext. Thus, only the hostbinary—which contains no malicious code—can be inspectedby a virus scanner.

Herath and Fogh [50] proposed to use hardware perfor-mance counters to detect cache attacks. Subsequently, severalother approaches instrumenting performance counters to de-tect cache attacks have been proposed [77]–[79]. However,according to Intel, SGX enclave activity is not visible in thethread-specific performance counters [80]. We verified thateven performance counters for last-level cache accesses aredisabled for enclaves. Figure 13 shows the results of a simpletest program running inside a debug and pre-release enclave,and without an enclave. The visible cache hits and misses arecaused by the host application only. This makes it impossiblefor current anti-virus software and other detection mechanismsto detect the malware.

Enclave Coloring. We propose enclave coloring as an ef-fective countermeasure against cross-enclave attacks. Enclavecoloring is a software approach to partition the cache intomultiple smaller parts. Each of the parts spans over multiplecache sets, and no cache set is included in more than one part.An enclave gets one or more such cache parts. This assignmentof cache parts is either done by the hardware or by a trustedoperating system.

L1 Hits L1 Misses L3 Hits L3 Misses0

0.5

1

1.5

·109

Perf

orm

ance

coun

ter

valu

e

Debug Pre-Release Native

Fig. 13. Performance counters for caches are disabled in an enclave. Flush+Reload of one variable in a loop results in a high cache activity, which canbe seen in native environment, but not on SGX debug or pre-release mode.

If implemented in software, the operating system can splitthe last-level cache through memory allocation. The cache setis determined by bits of the physical address. The lower bits ofthe cache set index are below bit 12 and therefore determinedby the page offset, i.e., the data’s position within a 4KBpage. The upper bits of the cache set are not visible to theenclave application and can thus be controlled by the operatingsystem when allocating pages. We call these upper bits a color.Whenever an enclave requests pages from the operating system(we consider the SGX driver as part of the operating system),it will only get pages with a color that is not present in anyother enclave. This coloring ensures, that two enclaves cannothave data in the same cache set, and thus an eviction of thedata—and therefore a Prime+Probe attack—is not possibleacross enclaves. However, attacks on the operating system orother processes on the same host would still be possible.

Enclave coloring requires a trusted operating system, and istherefore not always applicable as it contradicts SGX’s ideaof having an untrusted operating system [6]. If the operatingis trusted, this is an effective countermeasure against cross-enclave cache attacks.

To prevent attacks on the operating system or other pro-cesses, it would be necessary to partition the rest of thememory as well, i.e., system-wide cache coloring [81]. God-frey et al. [82] evaluated a coloring method for hypervisorsby assigning every virtual machine a partition of the cache.They concluded that this method is only feasible for a smallnumber of partitions. As the number of simultaneous enclavesis relatively limited by the available amount of SGX memory,enclave coloring can be applied to prevent cross-enclaveattacks. Protecting enclaves from malicious applications orpreventing malware inside enclaves is however not feasibleusing this method.

Heap Randomization. Our attack relies on the fact, thatthe used buffers for the multiplication are always at the samememory location. This is indeed the case, as the memoryallocator (dlmalloc) has a deterministic behavior and usesa best-fit approach for moderate buffer sizes as used in theRSA implementation. Freeing a buffer and allocating it againwill always result in the same memory location for the buffer.

We suggest randomizing the heap allocations for securityrelevant data such as the used buffers. A randomization of theaddresses and thus cache sets bears two advantages. First, anautomatic cache set detection is not possible anymore, as theidentified set will change for the next run of the algorithm.Second, if more than one trace is required to reconstruct thekey, this countermeasure increases the number of requiredtraces by multiple orders of magnitude as the probability tomeasure the correct cache set decreases.

Although not obvious at first glance, this method requires acertain amount of trust in the underlying operating system. Amalicious operating system could assign only pages mappingto certain cache sets to the enclave, similar to enclave coloring.Thus, the randomization is limited to only a subset of cachesets, increasing the probability for an attacker to measure thecorrect cache set from 0.1% to 7%.

Intel CAT. Recently, Intel introduced an instruction setextension called CAT (cache allocation technology) [51]. WithIntel CAT it is possible to restrict CPU cores to one of theslices of the last-level cache and even to pin cache lines.Liu et al. [83] proposed a system that uses CAT to protectgeneral purpose software and cryptographic algorithms. Theirapproach can be directly applied to protect against a maliciousenclave. However, this approach also does not allow to protectenclaves from an outside attacker.

C. Hardware Level

Combining Intel CAT with SGX. Instead of using IntelCAT on the operating level it could also be used to protectenclaves on the hardware level. By changing the eenterinstruction in a way that it implicitly activates CAT for thiscore, any cache sharing between SGX enclaves and the outsideas well as co-located enclaves could be eliminated. Thus,SGX enclaves would be protected from outside attackers.Furthermore, it would protect co-located enclaves as well asthe operating system and user programs against maliciousenclaves.

Secure RAM. To fully mitigate cache- or DRAM-basedside-channel attacks memory must not be shared among pro-cesses. We propose an additional secure memory element thatresides inside the CPU. Data stored within this memory isnot cachable, thus the memory has to be fast to not incurperformance penalties.

The SGX driver can then provide a special API to ac-quire this element for temporarily storing sensitive data. Acryptographic library could use this memory to execute codewhich depends on secret keys such as the square-and-multiplyalgorithm. Providing such a secure memory element per CPUcore would even allow parallel execution of multiple enclaves.

As data from this element is only accessed by one programand is never cached, cache attacks and DRAM-based attacksare not possible anymore. Moreover, if this secure memoryis inside the CPU, it is infeasible for an attacker to mountphysical attacks or to probe the memory bus. It is unclearwhether Intel’s eDRAM implementation can already be abused

as a secure memory to protect applications against cacheattacks.

VIII. CONCLUSION

There have been speculations that SGX could be vulnerableto cache side-channel attacks and might allow the implemen-tation of super malware. However, Intel claimed that SGX fea-tures impair side-channel attacks and recommends using SGXenclaves to protect cryptographic computations. Furthermore,it was presumed that they cannot perform harmful operations.

In this paper, we demonstrated the first malware runningin real SGX hardware enclaves. We demonstrated private keytheft in a fully automated end-to-end attack from a co-locatedSGX enclave, despite all restrictions of SGX, e.g., no timers,no large pages, no physical addresses, and no shared memory.

We developed the most accurate timing measurement tech-nique currently known for Intel CPUs, perfectly tailored tothe hardware. We combined DRAM and cache side channels,to build a novel approach that recovers physical addressbits without assumptions on the page size. We attack theRSA implementation of mbedTLS that is used for instance inOpenVPN. The attack succeeds despite protection against side-channel attacks using a constant-time multiplication primitive.We extract 96% of a 4096-bit RSA private key from a singlePrime+Probe trace and achieve full key recovery from only11 traces within 5 minutes.

Besides not fully preventing malicious enclaves, SGX pro-vides protection features to conceal attack code. Even the mostadvanced detection mechanisms using performance counterscannot detect our malware. Intel intentionally does not includeSGX activity in the performance counters for security reasons.However, this unavoidably provides attackers with the abilityto hide attacks as it eliminates the only known techniqueto detect cache side-channel attacks. We discussed multipledesign issues in SGX and proposed countermeasures thatshould be considered for future versions.

ACKNOWLEDGMENTS

This project has received funding from the EuropeanResearch Council (ERC) under the European Union’sHorizon 2020 research and innovation programme

(grant agreement No 681402).This work was partially sup-ported by the TU Graz LEADproject ”Dependable Internetof Things in Adverse Environ-ments”.

REFERENCES

[1] Q. Ge, Y. Yarom, D. Cock, and G. Heiser, “A survey of microarchitec-tural timing attacks and countermeasures on contemporary hardware,”Cryptology ePrint Archive, Report 2016/613, 2016., Tech. Rep., 2016.

[2] G. Irazoqui, M. S. Inci, T. Eisenbarth, and B. Sunar, “Wait a minute! Afast, Cross-VM attack on AES,” in RAID’14, 2014.

[3] B. Gulmezoglu, M. S. Inci, T. Eisenbarth, and B. Sunar, “A Faster andMore Realistic Flush+Reload Attack on AES,” in Constructive Side-Channel Analysis and Secure Design (COSADE), 2015.

[4] M. S. Inci, B. Gulmezoglu, G. Irazoqui, T. Eisenbarth, and B. Sunar,“Seriously, get off my cloud! cross-vm rsa key recovery in a publiccloud,” Cryptology ePrint Archive, Report 2015/898, 2015., Tech. Rep.,2015.

[5] Intel Corporation, “Intel Software Guard Extensions (Intel SGX),” https://software.intel.com/en-us/sgx, 2016, retrieved on November 7, 2016.

[6] V. Costan and S. Devadas, “Intel sgx explained,” Cryptology ePrintArchive, Report 2016/086, Tech. Rep., 2016.

[7] Intel Corporation, “Hardening Password Managers with Intel SoftwareGuard Extensions: White Paper,” 2016.

[8] J. Rutkowska, “Thoughts on Intel’s upcoming Software Guard Exten-sions (Part 2),” http://theinvisiblethings.blogspot.co.at/2013/09/thoughts-on-intels-upcoming-software.html, September 2013, retrieved on Octo-ber 20, 2016.

[9] J.-P. Aumasson and L. Merino, “SGX Secure Enclavesin Practice: Security and Crypto Review,” in BlackHat 2016 Briefings, Aug. 2016. [Online]. Available:https://www.blackhat.com/docs/us-16/materials/us-16-Aumasson-SGX-Secure-Enclaves-In-Practice-Security-And-Crypto-Review-wp.pdf

[10] Docker, “What is docker?” https://www.docker.com/what-docker, 2016.[11] S. Arnautov, B. Trach, F. Gregor, T. Knauth, A. Martin, C. Priebe,

J. Lind, D. Muthukumaran, D. O’Keeffe, M. L. Stillwell et al., “Scone:Secure linux containers with intel sgx,” in 12th USENIX Symposium onOperating Systems Design and Implementation (OSDI 16), 2016.

[12] P. C. Kocher, “Timing Attacks on Implementations of Diffe-Hellman,RSA, DSS, and Other Systems,” in Crypto’96, 1996.

[13] D. Page, “Theoretical use of cache memory as a cryptanalytic side-channel,” Cryptology ePrint Archive, Report 2002/169, 2002.

[14] D. J. Bernstein, “Cache-timing attacks on AES,” Department of Math-ematics, Statistics, and Computer Science, University of Illinois atChicago, Tech. Rep., 2005.

[15] C. Percival, “Cache missing for fun and profit,” in Proceedings ofBSDCan, 2005.

[16] D. A. Osvik, A. Shamir, and E. Tromer, “Cache Attacks and Counter-measures: the Case of AES,” in CT-RSA 2006, 2006.

[17] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, “Hey, You, Get Offof My Cloud: Exploring Information Leakage in Third-Party ComputeClouds,” in CCS’09, 2009.

[18] Y. Zhang, A. Juels, A. Oprea, and M. K. Reiter, “HomeAlone:Co-residency Detection in the Cloud via Side-Channel Analysis,” inS&P’11, 2011.

[19] F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee, “Last-Level CacheSide-Channel Attacks are Practical,” in S&P’15, 2015.

[20] G. Irazoqui, T. Eisenbarth, and B. Sunar, “S$A: A Shared CacheAttack that Works Across Cores and Defies VM Sandboxing – and itsApplication to AES,” in S&P’15, 2015.

[21] C. Maurice, M. Weber, M. Schwarz, L. Giner, D. Gruss, C. A. Boano,S. Mangard, and K. Romer, “Hello from the Other Side: SSH overRobust Cache Covert Channels in the Cloud,” in NDSS’17, 2017, toappear.

[22] D. Gruss, C. Maurice, and S. Mangard, “Rowhammer.js: A RemoteSoftware-Induced Fault Attack in JavaScript,” in DIMVA’16, 2016.

[23] Y. Yarom and K. Falkner, “Flush+Reload: a High Resolution, LowNoise, L3 Cache Side-Channel Attack,” in USENIX Security Symposium,2014.

[24] D. Gruss, R. Spreitzer, and S. Mangard, “Cache Template Attacks: Au-tomating Attacks on Inclusive Last-Level Caches,” in USENIX SecuritySymposium, 2015.

[25] G. Irazoqui, M. S. Inci, T. Eisenbarth, and B. Sunar, “Know thyneighbor: Crypto library detection in cloud,” Proceedings on PrivacyEnhancing Technologies, vol. 1, no. 1, pp. 25–40, 2015.

[26] M. Lipp, D. Gruss, R. Spreitzer, C. Maurice, and S. Mangard, “AR-Mageddon: Cache Attacks on Mobile Devices,” in USENIX SecuritySymposium, 2016.

[27] N. Benger, J. van de Pol, N. P. Smart, and Y. Yarom, ““ooh aah... justa little bit”: A small amount of side channel can go a long way,” inCHES’14, 2014.

[28] M. S. Inci, B. Gulmezoglu, G. Irazoqui, T. Eisenbarth, and B. Sunar,“Cache attacks enable bulk key recovery on the cloud,” in CHES’16,2016.

[29] G. Irazoqui, T. Eisenbarth, and B. Sunar, “Cross processor cacheattacks,” in Proceedings of the 11th ACM on Asia Conference onComputer and Communications Security (AsiaCCS’16), 2016.

https://software.intel.com/en-us/sgx

https://software.intel.com/en-us/sgx

http://theinvisiblethings.blogspot.co.at/2013/09/thoughts-on-intels-upcoming-software.html

http://theinvisiblethings.blogspot.co.at/2013/09/thoughts-on-intels-upcoming-software.html

https://www.blackhat.com/docs/us-16/materials/us-16-Aumasson-SGX-Secure-Enclaves-In-Practice-Security-And-Crypto-Review-wp.pdf

https://www.blackhat.com/docs/us-16/materials/us-16-Aumasson-SGX-Secure-Enclaves-In-Practice-Security-And-Crypto-Review-wp.pdf

https://www.docker.com/what-docker

[30] P. Pessl, D. Gruss, C. Maurice, M. Schwarz, and S. Mangard, “DRAMA:Exploiting DRAM Addressing for Cross-CPU Attacks,” in USENIXSecurity Symposium, 2016.

[31] “Intel Software Guard Extensions SDK for Linux OS Developer Refer-ence,” May 2016, rev 1.5.

[32] Y. Xu, W. Cui, and M. Peinado, “Controlled-Channel Attacks: Deter-ministic Side Channels for Untrusted Operating Systems,” in S&P’15,May 2015.

[33] I. Anati, F. McKeen, S. Gueron, H. Huang, S. Johnson, R. Leslie-Hurd,H. Patil, C. V. Rozas, and H. Shafi, “Intel Software Guard Extensions(Intel SGX),” 2015, tutorial Slides presented at ICSA 2015.

[34] ARMmbed, “Reduce mbed TLS memory and storage footprint,”https://tls.mbed.org/kb/how-to/reduce-mbedtls-memory-and-storage-footprint, February 2016, retrieved on October 24, 2016.

[35] C. Arnaud and P.-A. Fouque, “Timing attack against protected rsa-crtimplementation used in polarssl,” in CT-RSA 2013, 2013.

[36] O. Acıicmez and W. Schindler, “A vulnerability in rsa implementationsdue to instruction cache analysis and its demonstration on openssl,” inCT-RSA 2008, 2008.

[37] Y. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Cross-VM sidechannels and their use to extract private keys,” in CCS’12, 2012.

[38] D. Boneh, G. Durfee, and Y. Frankel, “An attack on rsa given a smallfraction of the private key bits,” in International Conference on theTheory and Application of Cryptology and Information Security, 1998.

[39] J. Blomer and A. May, “New partial key exposure attacks on rsa,” inCrypto’03, 2003.

[40] J. A. Halderman, S. D. Schoen, N. Heninger, W. Clarkson, W. Paul,J. A. Calandrino, A. J. Feldman, J. Appelbaum, and E. W. Felten, “Lestwe remember: cold-boot attacks on encryption keys,” Communicationsof the ACM, May 2009.

[41] N. Heninger and H. Shacham, Reconstructing RSA Private Keys fromRandom Key Bits, 2009.

[42] Docker, “Amazon web services - docker,” https://docs.docker.com/machine/drivers/aws/, 2016.

[43] Microsoft, “Create a docker environment in azure using thedocker vm extension,” https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-dockerextension/, Oct. 2016.

[44] Y. Li, J. McCune, J. Newsome, A. Perrig, B. Baker, and W. Drewry,“Minibox: A two-way sandbox for x86 native code,” in 2014 USENIXannual technical conference (USENIX ATC 14), 2014.

[45] A. Baumann, M. Peinado, and G. Hunt, “Shielding applications from anuntrusted cloud with haven,” ACM Transactions on Computer Systems(TOCS), 2015.

[46] F. Schuster, M. Costa, C. Fournet, C. Gkantsidis, M. Peinado, G. Mainar-Ruiz, and M. Russinovich, “Vc3: trustworthy data analytics in the cloudusing sgx,” 2015.

[47] Z. Bazrafshan, H. Hashemi, S. M. H. Fard, and A. Hamzeh, “A surveyon heuristic malware detection techniques,” in The 5th Conference onInformation and Knowledge Technology. Institute of Electrical andElectronics Engineers (IEEE), May 2013.

[48] O. Sukwong, H. Kim, and J. Hoe, “Commercial antivirus softwareeffectiveness: An empirical study,” Computer, Mar. 2011.

[49] J. Demme, M. Maycock, J. Schmitz, A. Tang, A. Waksman, S. Sethu-madhavan, and S. Stolfo, “On the feasibility of online malware detectionwith performance counters,” ACM SIGARCH Computer ArchitectureNews, vol. 41, no. 3, pp. 559–570, 2013.

[50] N. Herath and A. Fogh, “These are Not Your Grand DaddysCPU Performance Counters – CPU Hardware PerformanceCounters for Security,” in Black Hat 2015 Briefings, 2015.[Online]. Available: https://www.blackhat.com/docs/us-15/materials/us-15-Herath-These-Are-Not-Your-Grand-Daddys-CPU-Performance-Counters-CPU-Hardware-Performance-Counters-For-Security.pdf

[51] Intel, “Intel R© 64 and IA-32 Architectures Software Developer’s Manual,Volume 3 (3A, 3B & 3C): System Programming Guide,” vol. 253665,2014.

[52] ——, “Intel R© 64 and IA-32 Architectures Optimization ReferenceManual,” 2014.

[53] A. Fog, The microarchitecture of Intel, AMD and VIA CPUs: An opti-mization guide for assembly programmers and compiler makers, 2016,http://www.agner.org/optimize/microarchitecture.pdf (accessed version:2016-01-16).

[54] Y. Oren, V. P. Kemerlis, S. Sethumadhavan, and A. D. Keromytis, “TheSpy in the Sandbox: Practical Cache Attacks in JavaScript and theirImplications,” in CCS’15, 2015.

[55] Intel Corporation, “Intel(R) Software Guard Extensions for Linux* OS,”https://github.com/01org/linux-sgx-driver, 2016, retrieved on November11, 2016.

[56] L. De Moura and N. Bjørner, “Z3: An efficient smt solver,” in Inter-national conference on Tools and Algorithms for the Construction andAnalysis of Systems. Springer, 2008, pp. 337–340.

[57] C. Maurice, N. Le Scouarnec, C. Neumann, O. Heen, and A. Francillon,“Reverse Engineering Intel Complex Addressing Using PerformanceCounters,” in RAID’15, 2015.

[58] J. G. van Woudenberg, M. F. Witteman, and B. Bakker, “Improvingdifferential power analysis by elastic alignment,” in CT-RSA 2011, 2011.

[59] R. A. Muijrers, J. G. van Woudenberg, and L. Batina, “Ram: Rapidalignment method,” in International Conference on Smart Card Researchand Advanced Applications, 2011.

[60] V. I. Levenshtein, “Binary codes capable of correcting deletions, inser-tions and reversals,” in Soviet physics doklady, vol. 10, 1966, p. 707.

[61] E. B. Barker and A. L. Roginsky, “Transitions: Recommendation fortransitioning the use of cryptographic algorithms and key lengths,” NISTSP 800-131A Revision 1, Nov. 2015.

[62] C. D. Walter, “Longer keys may facilitate side channel attacks,” inInternational Workshop on Selected Areas in Cryptography, 2003.

[63] Y. Yarom and N. Benger, “Recovering openssl ecdsa nonces usingthe flush+ reload cache side-channel attack.” IACR Cryptology ePrintArchive, 2014.

[64] C. Pereida Garcıa, B. B. Brumley, and Y. Yarom, “Make sure dsa signingexponentiations really are constant-time,” in Proceedings of the 2016ACM SIGSAC Conference on Computer and Communications Security,2016.

[65] T. Allan, B. B. Brumley, K. Falkner, J. V. D. Pol, and Y. Yarom, “Am-plifying Side Channels Through Performance Degradation,” CryptologyePrint Archive: Report 2015/1141, 2015.

[66] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated perfor-mance comparison of virtual machines and linux containers,” in 2015IEEE International Symposium On Performance Analysis of Systems andSoftware (ISPASS), 2015.

[67] Intel Corporation, “kvm-sgx wiki,” https://github.com/01org/kvm-sgx/wiki, 2016, retrieved on November 11, 2016.

[68] W. Schindler, “Exclusive exponent blinding may not suffice to preventtiming attacks on rsa,” in International Workshop on CryptographicHardware and Embedded Systems, 2015.

[69] E. Biham, “A fast new des implementation in software,” in InternationalWorkshop on Fast Software Encryption, 1997, pp. 260–272.

[70] M. Matsui, “How far can we go on the x64 processors?” in InternationalWorkshop on Fast Software Encryption. Springer, 2006, pp. 341–358.

[71] M. Sudhakar, R. V. Kamala, and M. Srinivas, “A bit-sliced, scalable andunified montgomery multiplier architecture for rsa and ecc,” in 2007IFIP International Conference on Very Large Scale Integration, 2007,pp. 252–257.

[72] E. Kasper and P. Schwabe, “Faster and timing-attack resistant AES-GCM,” in Cryptographic Hardware and Embedded Systems (CHES),2009, pp. 1–17.

[73] D. Gullasch, E. Bangerter, and S. Krenn, “Cache Games – BringingAccess-Based Cache Attacks on AES to Practice,” in S&P’11, 2011.

[74] W.-M. Hu, “Reducing timing channels with fuzzy time,” Journal ofcomputer security, 1992.

[75] B. C. Vattikonda, S. Das, and H. Shacham, “Eliminating fine grainedtimers in xen,” in Proceedings of the 3rd ACM workshop on Cloudcomputing security workshop (CCSW’11), 2011.

[76] R. Martin, J. Demme, and S. Sethumadhavan, “Timewarp: rethinkingtimekeeping and performance monitoring mechanisms to mitigate side-channel attacks,” ACM SIGARCH Computer Architecture News, 2012.

[77] M. Chiappetta, E. Savas, and C. Yilmaz, “Real time detection ofcache-based side-channel attacks using hardware performance counters,”Cryptology ePrint Archive, Report 2015/1034, 2015.

[78] D. Gruss, C. Maurice, K. Wagner, and S. Mangard, “Flush+Flush: AFast and Stealthy Cache Attack,” in DIMVA’16, 2016.

[79] M. Payer, “HexPADS: a platform to detect “stealth” attacks,” in ES-SoS’16, 2016.

[80] Intel Corporation, “Intel SGX: Debug, Production, Pre-release what’sthe difference?” https://software.intel.com/en-us/blogs/2016/01/07/intel-sgx-debug-production-prelease-whats-the-difference, January 2016, re-trieved on October 24, 2016.

https://tls.mbed.org/kb/how-to/reduce-mbedtls-memory-and-storage-footprint

https://tls.mbed.org/kb/how-to/reduce-mbedtls-memory-and-storage-footprint

https://docs.docker.com/machine/drivers/aws/

https://docs.docker.com/machine/drivers/aws/

https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-dockerextension/

https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-dockerextension/

https://www.blackhat.com/docs/us-15/materials/us-15-Herath-These-Are-Not-Your-Grand-Daddys-CPU-Performance-Counters-CPU-Hardware-Performance-Counters-For-Security.pdf



http://www.agner.org/optimize/microarchitecture.pdf

https://github.com/01org/linux-sgx-driver

https://github.com/01org/kvm-sgx/wiki

https://github.com/01org/kvm-sgx/wiki

https://software.intel.com/en-us/blogs/2016/01/07/intel-sgx-debug-production-prelease-whats-the-difference

https://software.intel.com/en-us/blogs/2016/01/07/intel-sgx-debug-production-prelease-whats-the-difference

[81] H. Raj, R. Nathuji, A. Singh, and P. England, “Resource Managementfor Isolation Enhanced Cloud Services,” in Proceedings of the 1st ACMCloud Computing Security Workshop (CCSW’09), 2009, pp. 77–84.

[82] M. M. Godfrey and M. Zulkernine, “Preventing cache-based side-channel attacks in a cloud environment,” IEEE Transactions on CloudComputing, Oct. 2014.

[83] F. Liu, Q. Ge, Y. Yarom, F. Mckeen, C. Rozas, G. Heiser, and R. B.Lee, “Catalyst: Defeating last-level cache side channel attacks in cloudcomputing,” in IEEE International Symposium on High Performance

Computer Architecture (HPCA’16), 2016.

malware guard extension: using sgx to conceal …at this point anti-virus scanners can hook in to...

Documents