Download - Shimin Chen LBA Reading Group

1

Complete Information Flow Tracking from the Gates UpTiwari, Wassel, Mazloom, Mysore, Chong, Sherwood, UCSB, ASPLOS 2009

Shimin Chen

LBA Reading Group

2

Introduction In a traditional microprocessor, information is leaked practically

everywhere and by everything Can be a serious problem for exceptionally sensitive financial, military,

and personal data Cryptography, authentication

Developers in these domains are willing to go to remarkable lengths to minimize the amount of leaked information flushing the cache before and after executing a piece of critical code

(Osvik et al. 2006) attempting to scrub the branch predictor state (Aciicmez et al. 2007) normalizing the execution time of loops by hand (Kocher 1996) randomizing or prioritizing the placement of data into the cache (Lee et

al. 2005)

Previous works on DIFT are not adequate

3

GLIFT: Gate-Level Information-Flow Tracking This paper:

presents a processor architecture and implementation can track all information flows

A novel logic discipline: GLIFT logic Augment arbitrary logic blocks with tracking logic Make compositions of augmented blocks

Synthesizable processor implementation with a restricted ISA Provably-sound information-flow tracking Allow tasks such as public-key cryptography and message

authentication

4

Theoretical Understanding In a Turing-complete machine, the general problem

of determining whether information flows in a program from variable x to variable y is undecidable: “any procedure purported to decide it could be applied to

the statement if f(x) halts then y := 0 and thus provide a solution to the halting problem for arbitrary recursive function” (Denning and Denning 1977).

The paper builds a machine: by construction, will not allow unbounded execution All hidden flows of information are made explicit

5

Outline Introduction Gate Level Information Flow Tracking Architecture Evaluation Conclusions

6

Idea Understand how information flows through primitive logic

gates Compose these gates together into more complex structures Treat the whole processor as a logical function

Operates on a set of inputs Results in a set of outputs The trust of outputs should be determined based on the trust of

inputs Assumption:

Binary state: trusted (0) or untrusted (1)

7

GLIFT for an AND gate

AND Gate

AND GateTruth Table

Shadow logic for AND Gate

Partial truth table for the shadow logic

8

Composing Larger Functions

• Use MUX as a simple example

• The shadow logic can be composed from shadow logics of gates

• Not minimum but always sound, for example, the two inputs to the OR gate cannot be both 1

• If S is trusted and the selected input is trusted, o is trusted

• If S is untrusted, o is untrusted unless both a and b are trusted and are equal

9


10

Step 1: Handling Conditionals

Problem with conventional architecture If X is untrusted, then PC becomes untrusted Selected instruction becomes untrusted Bits that select target register are untrusted All of the registers may be marked as untrusted

Must keep PC trusted

11

Solution: Predication

All the instructions are executed If predicate is 0, the instruction does not have

effects: target register is not overwritten

PC is trusted Predicates can become untrusted

Suppose P0 is untrusted

12

Example

The line selecting R2 is untrusted The other control lines are trusted

R2 will be marked untrusted no matter P0= 0 or 1 End result: no matter the untrusted predicate is true

or not, the destination is marked as untrusted.

target

13

Step 2: Handling Loops Loops are hard

for (i=0; i<=X; i++) A[i]=1; Information flow from X to A[X+1]

A[X+1]==0 tells us about X Information flow from X to A[X+n] for all n

Implicit timing channel

14

Solution: Statically Specify Number of Iterations countjump instruction:

Specify number of loop iterations jump target address

Example (my understanding from the description) Loop start address:

…………countjump # iterations, loop start address

The first time countjump is encountered, the # iterations is loaded into an internal loop counter register

The loop counter register is decremented every time countjump is encountered, and PC loop start address

When the register becomes 0, PC PC + 1 countjump cannot be predicated

15

Early Termination In “C”, we have “break” statement that can terminate

a loop early Here, the paper proposes:

Predicate all the instructions in the loop with the termination condition

When the termination condition becomes true, the loop body does not have effects

16

Step 3: Constraining Loads and Stores Indirect loads and stores are bad

e.g., M[reg] value If reg is untrusted, then essentially all the memory locations

become untrusted “Intuitively, the problem is that accessing one untrusted address

causes every other address to become implicitly untrusted by virtue of them not being accessed or modified.”

Limit the ISA to only allow: Direct load/store: addresses are immediate constants Loop-relative addressing: load-looprel, store-looprel

e.g., load-looprel R0, 0x100, C0 Loads M[0x100 + C0] C0..C7 are counters: explicitly initialized by init-counter, and

incremented by a fixed value w/ increment-counter counter operations cannot be predicated

17

Proof-of-Concept Implementation Verilog Use Altera’s QuartusII software to synthesize it onto a Stratix

II FPGA 32-bit machine 64KB Instruction memory, 64KB Data Memory Registers:

A program counter 8 general purpose registers 2 predicate registers 8 registers to store loop counters (that count down the number of

iterations) 8 other registers to store explicit array indices (used as offsets for

load-looprel and store-looprel instructions). No pipelining

18

Augment the Processor with GLIFT Logic Each bit of processor state is explicitly shadowed:

every register gets a shadow register every memory has a shadow RAM

The logic and signals are shadowed by generating the proper trust propagation logic

19

ISA

20

A code snippet from the SubBytes function in AES encryption algorithm

Basically this is the following in “C”:

for (i=0; i<16; i++) { state[i] = SBox[state[i]]; }

21


22

Hardware Impact

Altera’s Nios is a commercial product: RISC instruction set, reasonably optimized

Nios econ: unpipelined 6 stage core, without caches, branch-predictors etc.

Nios std: pipelined, 4KB instruction cache

GLIFT base: unpipelined, no tracking

GLIFT full: GLIFT base + tracking

23

Hardware Impact

70 % area increase compared to GLIFT base

Small frequency degradation: adding GLIFT tracking does not have big impact on the latency

24

Application Kernels

Dynamic instruction counts vary substantially

• FSM and AES have a lot of table look-ups, which become full table iterations

25

Conclusions Bigger, slower, harder to program, and

computationally less powerful For the first time, provides the ability to account for

all information flows through the chip.

My learning: Understanding deeper about information leaks Efforts to prevent leaks are very significant

Sacrifice programmability: restrictions on loop, load/store Proof-of-concept does not even talk about issues such as

cache

Download - Shimin Chen LBA Reading Group

Top Related