Download - Shimin Chen LBA Reading Group
![Page 1: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/1.jpg)
1
Complete Information Flow Tracking from the Gates UpTiwari, Wassel, Mazloom, Mysore, Chong, Sherwood, UCSB, ASPLOS 2009
Shimin Chen
LBA Reading Group
![Page 2: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/2.jpg)
2
Introduction In a traditional microprocessor, information is leaked practically
everywhere and by everything Can be a serious problem for exceptionally sensitive financial, military,
and personal data Cryptography, authentication
Developers in these domains are willing to go to remarkable lengths to minimize the amount of leaked information flushing the cache before and after executing a piece of critical code
(Osvik et al. 2006) attempting to scrub the branch predictor state (Aciicmez et al. 2007) normalizing the execution time of loops by hand (Kocher 1996) randomizing or prioritizing the placement of data into the cache (Lee et
al. 2005)
Previous works on DIFT are not adequate
![Page 3: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/3.jpg)
3
GLIFT: Gate-Level Information-Flow Tracking This paper:
presents a processor architecture and implementation can track all information flows
A novel logic discipline: GLIFT logic Augment arbitrary logic blocks with tracking logic Make compositions of augmented blocks
Synthesizable processor implementation with a restricted ISA Provably-sound information-flow tracking Allow tasks such as public-key cryptography and message
authentication
![Page 4: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/4.jpg)
4
Theoretical Understanding In a Turing-complete machine, the general problem
of determining whether information flows in a program from variable x to variable y is undecidable: “any procedure purported to decide it could be applied to
the statement if f(x) halts then y := 0 and thus provide a solution to the halting problem for arbitrary recursive function” (Denning and Denning 1977).
The paper builds a machine: by construction, will not allow unbounded execution All hidden flows of information are made explicit
![Page 5: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/5.jpg)
5
Outline Introduction Gate Level Information Flow Tracking Architecture Evaluation Conclusions
![Page 6: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/6.jpg)
6
Idea Understand how information flows through primitive logic
gates Compose these gates together into more complex structures Treat the whole processor as a logical function
Operates on a set of inputs Results in a set of outputs The trust of outputs should be determined based on the trust of
inputs Assumption:
Binary state: trusted (0) or untrusted (1)
![Page 7: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/7.jpg)
7
GLIFT for an AND gate
AND Gate
AND GateTruth Table
Shadow logic for AND Gate
Partial truth table for the shadow logic
![Page 8: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/8.jpg)
8
Composing Larger Functions
• Use MUX as a simple example
• The shadow logic can be composed from shadow logics of gates
• Not minimum but always sound, for example, the two inputs to the OR gate cannot be both 1
• If S is trusted and the selected input is trusted, o is trusted
• If S is untrusted, o is untrusted unless both a and b are trusted and are equal
![Page 9: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/9.jpg)
9
Outline Introduction Gate Level Information Flow Tracking Architecture Evaluation Conclusions
![Page 10: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/10.jpg)
10
Step 1: Handling Conditionals
Problem with conventional architecture If X is untrusted, then PC becomes untrusted Selected instruction becomes untrusted Bits that select target register are untrusted All of the registers may be marked as untrusted
Must keep PC trusted
![Page 11: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/11.jpg)
11
Solution: Predication
All the instructions are executed If predicate is 0, the instruction does not have
effects: target register is not overwritten
PC is trusted Predicates can become untrusted
Suppose P0 is untrusted
![Page 12: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/12.jpg)
12
Example
The line selecting R2 is untrusted The other control lines are trusted
R2 will be marked untrusted no matter P0= 0 or 1 End result: no matter the untrusted predicate is true
or not, the destination is marked as untrusted.
target
![Page 13: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/13.jpg)
13
Step 2: Handling Loops Loops are hard
for (i=0; i<=X; i++) A[i]=1; Information flow from X to A[X+1]
A[X+1]==0 tells us about X Information flow from X to A[X+n] for all n
Implicit timing channel
![Page 14: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/14.jpg)
14
Solution: Statically Specify Number of Iterations countjump instruction:
Specify number of loop iterations jump target address
Example (my understanding from the description) Loop start address:
…………countjump # iterations, loop start address
The first time countjump is encountered, the # iterations is loaded into an internal loop counter register
The loop counter register is decremented every time countjump is encountered, and PC loop start address
When the register becomes 0, PC PC + 1 countjump cannot be predicated
![Page 15: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/15.jpg)
15
Early Termination In “C”, we have “break” statement that can terminate
a loop early Here, the paper proposes:
Predicate all the instructions in the loop with the termination condition
When the termination condition becomes true, the loop body does not have effects
![Page 16: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/16.jpg)
16
Step 3: Constraining Loads and Stores Indirect loads and stores are bad
e.g., M[reg] value If reg is untrusted, then essentially all the memory locations
become untrusted “Intuitively, the problem is that accessing one untrusted address
causes every other address to become implicitly untrusted by virtue of them not being accessed or modified.”
Limit the ISA to only allow: Direct load/store: addresses are immediate constants Loop-relative addressing: load-looprel, store-looprel
e.g., load-looprel R0, 0x100, C0 Loads M[0x100 + C0] C0..C7 are counters: explicitly initialized by init-counter, and
incremented by a fixed value w/ increment-counter counter operations cannot be predicated
![Page 17: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/17.jpg)
17
Proof-of-Concept Implementation Verilog Use Altera’s QuartusII software to synthesize it onto a Stratix
II FPGA 32-bit machine 64KB Instruction memory, 64KB Data Memory Registers:
A program counter 8 general purpose registers 2 predicate registers 8 registers to store loop counters (that count down the number of
iterations) 8 other registers to store explicit array indices (used as offsets for
load-looprel and store-looprel instructions). No pipelining
![Page 18: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/18.jpg)
18
Augment the Processor with GLIFT Logic Each bit of processor state is explicitly shadowed:
every register gets a shadow register every memory has a shadow RAM
The logic and signals are shadowed by generating the proper trust propagation logic
![Page 19: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/19.jpg)
19
ISA
![Page 20: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/20.jpg)
20
A code snippet from the SubBytes function in AES encryption algorithm
Basically this is the following in “C”:
for (i=0; i<16; i++) { state[i] = SBox[state[i]]; }
![Page 21: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/21.jpg)
21
Outline Introduction Gate Level Information Flow Tracking Architecture Evaluation Conclusions
![Page 22: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/22.jpg)
22
Hardware Impact
Altera’s Nios is a commercial product: RISC instruction set, reasonably optimized
Nios econ: unpipelined 6 stage core, without caches, branch-predictors etc.
Nios std: pipelined, 4KB instruction cache
GLIFT base: unpipelined, no tracking
GLIFT full: GLIFT base + tracking
![Page 23: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/23.jpg)
23
Hardware Impact
70 % area increase compared to GLIFT base
Small frequency degradation: adding GLIFT tracking does not have big impact on the latency
![Page 24: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/24.jpg)
24
Application Kernels
Dynamic instruction counts vary substantially
• FSM and AES have a lot of table look-ups, which become full table iterations
![Page 25: Shimin Chen LBA Reading Group](https://reader035.vdocuments.site/reader035/viewer/2022062304/56812abc550346895d8e7fdd/html5/thumbnails/25.jpg)
25
Conclusions Bigger, slower, harder to program, and
computationally less powerful For the first time, provides the ability to account for
all information flows through the chip.
My learning: Understanding deeper about information leaks Efforts to prevent leaks are very significant
Sacrifice programmability: restrictions on loop, load/store Proof-of-concept does not even talk about issues such as
cache