secure compiler seminar 9/12 survey on design of secure low-level languages toshihiro yoshino,...

36
Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa L ab. <[email protected] >

Upload: julie-holmes

Post on 31-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Secure Compiler Seminar 9/12

Survey on Design ofSecure Low-Level Languages

Toshihiro YOSHINO, Yonezawa Lab.<[email protected]>

Page 2: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

A Secure Low-Level Language is Needed

As the secure compiler target language

“secure” means it has a method to prove program’s properties Memory safe, control flow safe, … For this, its concrete formal model

should be given It should also be

low-level To reduce complexity of JIT compiler

(In other words, TCB)

High-LevelProgram

MachineExecutable

SecureLow-LevelLanguage

Secure Compiler

JIT compilation

Verifier

Page 3: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Two major approaches

1. TAL, PCC Extension to conventional assembly languages Utilizes certain logic (such as type theories) to prove safety

2. Virtual Machines Introduce intermediate languages of their own Many of them adopt safe-by-construction design

e.g. Java VM is semantically safe in memory operation Java VM, Microsoft CIL, mvm [Franz et al. 2003],

Jinja [Klein et al. 2006], ADL [Yoshino 2006], …

Existing Researches

Page 4: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Comparison: PCC vs. Java VM

PCC (Extended) Machine code + Proof Generated code and proof are machine-depe

ndent Requires one VC implementation for each archit

ecture

Java VM Verifier ensures type and control flow safety It often restricts optimizations

Leads to performance degradation High cost to perform verification

Stack is nothing more than a set of untyped (variable-number) registers

Machine Independent

Machine Dependent

Page 5: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Limitations ofJava Bytecode Verification

Initialization check is incomplete Correlation with other variables is not taken into account Example:

class Test { int test(boolean b) { int i; try { if(b) return 1; i = 2; } finally { if(b) i = 3; } return i; }}

Incomplete common subexpression elimination Cannot eliminate c.s. in address calculations (array refs)

Page 6: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Then What Should We Do?

Maximize machine-independent part Avoids porting cost of the system Tradeoff against the size of TCB (Trusted Comp

uting Base) But recent works in PCC and TAL (e.g. Foundational

PCC) aim solely to minimize TCB

Reduce proof size and generation cost PCC requires much effort to produce proof, bec

ause the target’s level is very low Registers and memory are untyped, etc.

Page 7: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

mvm [Franz et al. 2003]

Aimed to find the semantics level that: Is effective at supporting proof-carrying code Can also be translated efficiently

into highly performing native code(on many platforms)

Separated design between VMlayer and PCC layer

[1] M. Franz et al. A Portable Virtual Machine Target for Proof-Carrying Code. IVME ’03.

Page 8: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

mvm: Virtual Machine Design

Register-based architecture The number of registers is not bounded Registers are categorized by the type of values:

Integer, Boolean, Pointer, Address Pointer registers are used to store pointers to heap

objects (more specifically, array heads) Address registers are for storing results of address

arithmetic• Bounds check is not performed in arithmetic, so it has to

be checked in higher layer

Heap can be used to store objects Heap model is explained next

Page 9: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

mvm: Virtual Machine Design

mvm Virtual Machine Heap

1

42

false

true

… … …

Integer Boolean

label1: instr instr …label2: …

Pointer Address

Page 10: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

mvm: Heap Model

mvm heap consists of arrays of objects

Object representation in mvm Each object is tagged

Tag can only be written with new operation and is immutable after creation

Two sections of data area: values and pointers Integers, booleans are stored into the first section Pointers are stored into the second section

1 42

Page 11: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

mvm: Heap Model

A type is associated with its tag value, layout and structure This association is managed by compiler Layout describes the sizes of data sections Structure describes the possible substructure inside

pointer section

Example of type information

<> means a tuple

{} means disjunction

datatype T = Int of int | Pair of int * int

… T list …

Page 12: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

mvm: Heap Model

Example of a T list object tree

T list

T

int*int

Page 13: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

mvm: Instructions

Arithmetics, Logical calculation Similar to many other languages ;-)

Branch Unconditional: goto label Conditional: brtrue bi, label / brfalse bi, label

Condition must be taken from a boolean register Jump is allowed only to a label

Conditional by object tag (RTTI): iftag

Page 14: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

mvm: Instructions

Object creation and access pj = new(tag, ik)

Creates an array of ik objects with type tag

r = load([sizev, sizep] | tag, pk, offset)

store([sizev, sizep] | tag, pk, offset, r) sizes and tag are used to check memory safety

Pointer registers and address registers Object access also permit address registers ak

This distinction is for supporting garbage collection Address registers always contain “derived” pointers

Page 15: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

mvm: Instructions

Accessing arrays an = adda([sizev, sizep] | tag, pk, il)

Calculates address of the il-th element in an array of type tag stored at pk

in = getlen(pk)

Guards Bounds checking: CHECKLEN(pk, il)

Validity checking: CHECKNOTNULL(pk)

Type checking: CHECKTAG(pi, [sizev, sizep] | tag) These guards are inserted when static checking failed

Page 16: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

An Example mvm Program

Page 17: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Type Safety in mvm Programs

Operations on primitives are all type-safe Because registers to store values are distinct

Type-safety proofs are needed only for non-primitive operations Pointers, arrays and records For every pointer operation, check that result

pointer: Points to the beginning of an array, record or value Points to an object of the correct type

Page 18: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Jinja [Klein et al. 2006]

A Java-like programming language built on Isabelle/HOL Formal description of Jinja language, Jinja VM a

nd compiler are given

Several properties were machine-checked Big step evaluation and small step evaluation (at

omic operations) are equivalent Compiler correctness

[2] G. Klein, T. Nipkow. A Machine-Checked Model for a Java-Like Language, Virtual Machine, and Compiler. TOPLAS 28(4), 2006.

Page 19: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Jinja Language

“Jinja is not Java”

Object-oriented language with exceptions A program is a set of class definitions and,

a class consists of several fields and methods Method body is an expression

Overriding is supported as in Java But not overloading, because it is complicated

Language is statically typed Type system ensures that the execution of a well-typed

program never gets stuck

Page 20: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Jinja Language: Language Elements

Values Boolean Bool b, Integer Intg i, Reference Addr a Null reference Null, Dummy value Unit

Expressions Val v , binary operations e1 op e2 , Var V , V := e , e1; e2 , … Conditional: if (e) e1 else e2 , while (e) e’ / Block: {V:T, e} Object construction: new C Casting: Cast C e Field access: e.F{D} , e.F{D} := e

D is annotation added in preprocessing (e.g., by typechecker) Method call: e.M(e, e, …) Exception: throw e , try e1 catch(C V) e2

Page 21: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Big step semantics Typical operational semantics

State = <Heap, Local Variables> Detail abbreviated because nothing special

Small step semantics Finer-grained semantics

One-step evaluation Useful for formalizing parallelism (?)

Each (small) operation is considered atomic Not discussed in the paper

Jinja Language: Semantics

Page 22: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Jinja Language: Semantics

Big and small semantics are proven to be equivalent

wwf-J-prog means “weak well-formedness”, which is defined by the following properties:

Number of parameter types and of parameter names are equal

“this” is not included in parameter list Free variables in the method body only refer to this o

r these parameters

Page 23: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Jinja VM

Similar to Java VM Stack-based machine with heap

State = <addr option, heap, frame list> First element is possibly a generated exception Third element is a call-stack

Frame =<stack, registers, cname, mname, pc>

where stack = value list, registers = value list Evaluation of operands are done on stack Registers are for storing local variables

Page 24: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Jinja VM: Instructions

Basic operations Push v / Pop Register operations: Load n / Store n Arithmetics: IAdd, … Logical operations: CmpEq, …

Object manipulation Construction: New cname Casting: Checkcast cname Field access:

Getfield vname cname / Putfield vname cname

Method invocation: Invoke mname n

Page 25: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Jinja VM: Instructions

Control flow operation Branching: Goto n / IfFalse n

n is relative offset from the instruction Exit from a method: Return

Exception Throwing an exception: Throw Information about exception handlers (try-catch)

are attached to method declarations Handler is retrieved from there when needed

Page 26: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Jinja VM: Semantics

Please refer to the paper for detail Basically, straightforward and intuitive

In this level, there are no runtime checks For example, IAdd (Integer addition) does not c

heck whether its argument is really integers Otherwise, the result is unspecified

This kind of checks is performed by a bytecode verifier

Page 27: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Jinja VM Bytecode Verification

JVM relies on the following assumptions: Types are correct No overflow or underflow in stack Code containment Register initialization before use

Just the same as Java VM

Bytecode verifier statically ensures these assumptions

Page 28: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Jinja VM Bytecode Verification

Abstract interpretation Instead of values, consider only types

Intg 1Addr 1

Addr 1

Addr 3

0

State

IntClass A

Class A

Class B

Int

State Type

Addr 2

Addr 1

Addr 3

0Class B

Class A

Class B

Int

JVM Program

Page 29: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Jinja Compiler and its Correctness

2-staged compilation1. Map parameter names to register indices

Assign local variables to registers Gather variable occurrences and use it to lookup

2. Code generation expression → instruction list (compE2)

• Straightforward definition

Exception table generation (compEx2)

• Separated from compE2, because exception table must contain global addresses

Page 30: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Jinja Compiler and its Correctness

Correctness of compilation  

If a program is weakly well-formed, then:

Jinjaprogram

JVMbytecode

Heap,Vars

Heap,Vars

Heap,[Frame]

Heap,[]

compilation

Page 31: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Implementation of Jinja

http://afp.sourceforge.net/entries/Jinja.shtml

About 20kLoC in Isabelle/HOL Over 1,000 theorems are defined It takes about 25 min. to process these proofs o

n a 3GHz Pentium 4 machine with 1GB RAM

Page 32: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

Summary of Today’s Talk

We would need a secure low-level language for the target of a secure compiler Minimize machine-dependent part to reduce implementation cost Also reduce cost for proof generation

To answer this, surveyed two VM projects mvm

Aimed to find the “sweet spot” that reconciles high performance and small type-safety proofs

Jinja Constructed a unified formal model of a Java-like language, the und

erlying VM and compiler In contrast to mvm, this research is oriented toward higher-level lan

guages and compiler’s properties

Page 33: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

How about ADL [Yoshino 2006] …?

The position of ADL is close to mvm To provide a common basis of implementing ver

ifier for low-level languages Assumed translation direction is opposite

mvm is an intermediate code of compilation ADL is designed to simulate real machines

MachineCodeSecure L3

JVM, mvm

ADL

Page 34: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

How about ADL [Yoshino 2006] …?

ADL takes minimalist approach Only 7 kinds of commands Instead, expression-based design to allow complex

formulae to be easily written

ADL can be used as an intermediate language? Probably some modification needed

Register allocation is done, but except for variables Minimalist design, however, may increase complexity in

constructing a verification logic• Abstract interpretation is often not sufficient, so a verification

logic may want to calculate exact values

Page 35: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

More References

LLVM Project [Lattner 2000]

http://www.llvm.org/ Use VM for interprocedural optimization

SafeTSA [Amme et al. 2001]

SSA-based language for mobile code security

Dis virtual machine [Winterbottom et al. 1997]

Omniware system [Adl-Tabatabai et al. 1996]

Page 36: Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. tossy-2@yl.is.s.u-tokyo.ac.jp

(Typical) Compiler Construction andSeveral Intermediate Languages

Lexing /Parsing

TypeChecking

Normalize(SSA, etc.)

OptimizeIntermediate Code

Generation

RegisterAllocation

Target CodeGeneration

PrettyPrinting

High-LevelProgram

MachineExecutable

Java, CIL,

Jinja(VM)TAL,PCC

mvm

ADL

LLVM,SafeTSA