inse 6150 security evaluation methodologiesclark/courses/1401-6150/...inse 6150 security evaluation...

INSE 6150 Security Evaluation Methodologies

Vulnerability Analysis Prof. Jeremy Clark

Presented by: Gaby Dagher

The lecture notes are based on materials by Dr. M. Debbabi 1

Agenda

n  Introduction n  Flow Analysis n  MetaCompilation approach

n  Metal Extensions n  Intraprocedural algorithm

n  Conclusion

2

Motivations

[ Source: National Vulnerability Database (NIST) ]

3

n  Securing a software is a challenging task

n  Lines of Code: n  Windows Vista: 50 Millions

n  Mac OS X 10.4: 86 Millions

n  GNU/ Linux: 283 Millions

Motivation

4

Vulnerability Analysis

n  Could be achieved by program analysis. n  Program analysis:

n  Determine program/expression/statement/data properties.

n  Extract information from programs.

n  Two types of analyses: n  Static Analysis:

n  Analyze programs without executing them.

n  Dynamic Analysis: n  Analyze programs at runtime.

5

Dynamic Analysis

n  Advantages: n  Information is available at runtime. n  Easier to implement.

n  Disadvantages: n  Valid only for one execution path. n  Significant overhead during program

execution.

6

Static Analysis

n  Advantages: n  No overhead at runtime. n  A lot of research results (algorithms,

methodologies, frameworks, tools, etc.). n  Analyze all the execution paths.

n  Disadvantages: n  Somewhat elaborate to design and

implement. n  Non-decidability issues.

7

Static Analysis

q  Static Analysis initially emerged in the domain of compiler optimization

Lexical Analysis

Source program Grammar

Tokens

Syntactic Analysis

Abstract Syntax Tree (AST)

Semantic/Static Analysis

Optimization

Modified AST

Code generation

Object/Executable Code 8

Parser

Static Information Inline Expansion

Dead Code Elimination

Common Expression Elimination

Moving Loop Invariants

Constant Propagation

.

.

.

Parallel Execution

Definitions

lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Syntactic analysis involves parsing the token sequence to identify the syntactic structure of the program. This phase typically builds a parse tree, which replaces the linear sequence of tokens with a tree structure built according to the rules of a formal grammar which define the language's syntax. Semantic analysis is the phase in which the compiler adds semantic information to the parse tree. This phase performs semantic checks such as type checking (checking for type errors), or assignment checking (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings.

9

Optimization – Example1

n  Common expression elimination.

10


n  Moving loop invariants.

11


n  Constant Propagation.

12

Agenda



n  Conclusion

13

Flow Analysis

n  The purpose of flow analysis is to determine information about functions and data structures that can be called from various program points during execution of the program.

n  Generally, flow analysis refers to two types: n  Control-flow analysis n  Data-flow analysis

14

Control-Flow Analysis

n  x := a + b; n  y := a * b; n  while (y > a + b) { n  a := a + 1; n  x := a + b n  }

if y > a + b

a := a + 1

x := a + b

y := a * b

x := a + b

Ø  Control-flow graphs are state-transition systems.

15

Control-flow graph

Data-Flow analysis.

n  For each program point p, which expressions must have already been computed, and not later modified, on all paths to p.

n  Optimization: Where available, expressions need not be recomputed.

if y > a + b

a := a + 1

x := a + b

y := a * b

x := a + b

a+b is available here

16

Data-Flow analysis - Example

if y > a + b

a := a + 1

x := a + b

y := a * b

x := a + b a+b

a+b, a*b

-

a+b

-

a+b, y > a+b

17

a+b, y <= a+b

Agenda



n  Conclusion

18

MetaCompilation Approach

n  The MetaCompilation (MC) approach takes advantage of the compilation process to check violations of security rules in source code.

n  Main objectives: n  Find as much security bugs as possible n  Define an approach that scales to large programs

19

MetaCompilation Approach

n  MetaCompilation approach: n  Maps a security rule to code statements n  Define rules as high-level system-specific checkers n  Dynamically link the checkers to the compiler n  Compiler performs the essence of the analysis and

checks security rule violations

20

MetaCompilation Components

n  metal high-level automata language to express security properties

n  xgcc interprocedural analysis engine that executes metal extensions

ent->data = kmalloc(..) if(!ent->data)

free(ent); goto out;

… out: return ent;

Linux fs/proc/ generic.c

xgcc compiler

free checker “using ent after free!”

21

Agenda



n  Conclusion

22

Metal Extensions/Security Checkers

n  Programmers use a high-level automata language called metal to express their application-specific security checkers or extensions.

n  A metal extension define a collection of one or more state machines (SM).

n  There are two types of metal extensions: n  Global extension, n  Variable-specific extension.

23

Global Extensions n  A global extension tracks program-wide properties such as

"interrupts are disabled" n  An interrupt is simply a signal that a hardware or software

can send when it wants the processor's attention to execute its critical section of code (a section that should not be stopped in the middle).

n  If a program is executing a critical section of code, the programmer should turn off interrupts and disable acknowledgment of all incoming interrupts.

n  To build a critical section, at the kernel-level, there are two simple instructions to aid us in doing so: n  cli() is a mnemonic for CLear Interrupts: it turns off interrupts n  sti() is a mnemonic for SeT Interrupts: it turns on interrupts

n  When cli() is called, interrupts are disabled. This means that no other system tasks can be performed until you enable interrupts by executing sti().

24

Global Extension: Interrupt Checkers

//Global interrupt checker that warns //when interrupts are not restored sm cli_sti {

enabled: { cli(); } ==> disabled |{ sti(); } ==> stop,

{err(“Double sti”);} disabled:

{ sti(); } ==> enabled |{ cli(); } ==> stop,

{err(“Double cli”);} | $end_of_path$ ==> stop,

{err(“Did not reverse”);} stop:

}

initial global state

stop

cli()

enabled

sti()

cli()

end_of_path

sti()

n  A global extension tracks transition in a single SM that defines a list of global states,

n  The first state in the list is implicitly defined as the initial global state of the SM,

n  Each transition is defined with a pattern that identifies a source statement, when encountered in the source code, will cause the transition to execute.

disabled

25

patterns

special pattern that evaluates to true when the program terminates

Global Extension: Interrupt Checkers //Global interrupt checker that warns //when interrupts are not restored sm cli_sti {






}

//goes to disabled state int fun_caller() {

cli(); fun();

} int fun(void) {

if (random()) sti();

} void err(char* error) {

print (error); }

Initial global state

stop

cli()

enabled

sti()

cli()

end_of_path

sti()

disabled

26

First Execution

Global Extension: Interrupt Checkers //Global interrupt checker that warns

//when interrupts are not restored sm cli_sti {






}

int fun_caller() { cli(); fun();

} int fun(void) {



print (error); }

stop

cli()

enabled

sti()

cli()

end_of_path

sti()

disabled

27

First Execution








}

// false


} int fun(void) {



print (error); }

stop

cli()

enabled

sti()

cli()

end_of_path

sti()

disabled

28

First Execution








}


} int fun(void) {



print (error); }

stop

cli()

enabled

sti()

cli()

end_of_path

sti()

disabled

29

First Execution








}


} int fun(void) {



print (error); }

stop

cli()

enabled

sti()

cli()

end_of_path

sti()

disabled

30

First Execution








}


} int fun(void) {



print (error); }

stop

cli()

enabled

sti()

cli()

end_of_path

sti()

disabled

31

First Execution

Global Extension: Interrupt Checkers //Global interrupt checker that warns //when interrupts are not restored sm cli_sti {






}

//goes to disabled state int fun_caller() {

cli(); fun();

} int fun(void) {



print (error); }

Initial global state

stop

cli()

enabled

sti()

cli()

end_of_path

sti()

disabled

32

Second Execution








}


} int fun(void) {



print (error); }

stop

cli()

enabled

sti()

cli()

end_of_path

sti()

disabled

33

Second Execution








}

// true


} int fun(void) {



print (error); }

stop

cli()

enabled

sti()

cli()

end_of_path

sti()

disabled

34

Second Execution








}


} int fun(void) {



print (error); }

stop

cli()

enabled

sti()

cli()

end_of_path

sti()

disabled

35

Second Execution








}


} int fun(void) {



print (error); }

stop

cli()

enabled

sti()

cli()

end_of_path

sti()

disabled

36

Second Execution

Variable-Specific Extensions

n  A variable-specific extension captures properties associated with specific program objects (any expression that has an associated state): n  Structure fields, arithmetic expressions, pointers, etc.

n  Variable-specific properties can be: n  e.g.: A NULL pointer p should not be de-referenced. n  e.g.: A freed pointer p should not be used.

n  A variable-specific extension is comprised of a series of state machines, each of which tracks the state attached to a single object.

37

Variable-Specific Extension : Free Checker

sm free_checker { state decl any_pointer v; start:{ kfree(v) } ==> v.freed; v.freed:{ *v } ==> v.stop,

{ err(“using %s after free”, mc_identifier(v));}

| { kfree(v) } ==> v.stop, { err(“double free of %s”,

mc_identifier(v));} ; }

n  The keyword state define a single type identifier used to refer to the program object that a single SM is tracking.

n  The keyword decl define a metal hole variable that will match source construct of the appropriate type. The hole variable v matches any pointer to any type

n  The state of a SM consists of the value of the global instance and the value of one of the variable specific instances

n  The state value start is bound to the implicitly defined global state

n  The notation v.freed means that the state value freed is bound to v.

This meta type matches pointers to any type

38






n  A variable-specific extension has two types of transitions: n  Creation transition: it tells the

analysis when to begin track a new object and is guarded by the identifier start. When the encountered code statement matches pattern kfree(v) a new state machine is created to track the new instance of object v

n  State transition: it describes the SM that each program object must follow.

39






1: int contrived_caller (int *w, int x, int *p) { 2: kfree (p); 3: kfree (w); 4: contrived (p, w, x); 5: return *w; 6:}

n  Extension Initial state: {(start,<>)} n  The special value <> reflects the

fact that the extension does not know about any freed variables

n  Extension state after line 2: {(start, <>), (start, v : p → freed)}

n  A new SM is created to track the pointer p.

n  Extension state after line 3: {(start, <>), (start, v: p → freed), (start, v : w → freed) }

n  Another SM is created to track the pointer w.

n  Thus, a variable-specific extension state is a collection of one or more SM states

40

Metal Patterns

n  Metal patterns are used to identify source code action relevant to a given security rule such as dereferencing pointers { *v } or freeing pointers { kfree(v)}

n  Patterns are written in an extended version of the source language (C) and can specify almost arbitrary language constructs such as declarations, expressions and statements.

n  Metal patterns define the SM alphabets

41

Metal Patterns

n  A metal hole variable declared with the keyword decl will match source construct of the appropriate type.

Hole Type Matches

any_expr any legal expression

any_scalar any scalar value (int, float, etc.)

any_pointer any pointer of any type

any_arguments any argument list

any_fn_call any function call

42

Agenda



n  Conclusion

43

Basic Blocks

§  A basic block is a sequence of consecutive intermediate language statements in which flow of control can only enter at the beginning and leave at the end. §  Only the last statement of a basic block can be a branch statement and only the first statement of a basic block can be a target of a branch.

44

Basic Block Partitioning Algorithm

1. Identify leader statements (i.e. the first statements of basic blocks) by using the following rules:

(i) The first statement in the program is a leader

(ii) Any statement that is the target of a branch statement is a leader

(iii) Any statement that immediately follows a branch or return statement is a leader

45

Example: Finding Leaders

begin prod := 0; i := 1; do begin prod := prod + a[i] * b[i]; i = i+ 1; end while i <= 20 end

The following code computes the inner product of two vectors.

Source code

(1) prod := 0 (2) i := 1 (3) t1 := 4 * i (4) t2 := a[t1] (5) t3 := 4 * i (6) t4 := b[t3] (7) t5 := t2 * t4 (8) t6 := prod + t5 (9) prod := t6 (10) t7 := i + 1 (11) i := t7 (12) if i <= 20 goto (3)

Three-address code

46



(1) prod := 0 (2) i := 1 (3) t1 := 4 * i (4) t2 := a[t1] (5) t3 := 4 * i (6) t4 := b[t3] (7) t5 := t2 * t4 (8) t6 := prod + t5 (9) prod := t6 (10) t7 := i + 1 (11) i := t7 (12) if i <= 20 goto (3) (13) …

Source code

Three-address code

Rule (i)


47




Source code

Three-address code

Rule (i)

Rule (ii) begin prod := 0; i := 1; do begin prod := prod + a[i] * b[i]; i = i+ 1; end while i <= 20 end

48




Source code

Three-address code

Rule (i)

Rule (ii)

Rule (iii)


49

Forming the Basic Blocks

2. The basic block corresponding to a leader consists of the leader, plus all statements up to but not including the next leader or up to the end of the program.

Now that we know the leaders, how do we form the basic blocks associated with each leader?

50

Example: Forming the Basic Blocks

Basic Blocks:

(1) prod := 0 (2) i := 1

(3) t1 := 4 * i (4) t2 := a[t1] (5) t3 := 4 * i (6) t4 := b[t3] (7) t5 := t2 * t4 (8) t6 := prod + t5 (9) prod := t6 (10) t7 := i + 1 (11) i := t7 (12) if i <= 20 goto (3)

(13) …

B1

B2

B3 51

Control Flow Graph (CFG)

§  A control flow graph (CFG), or simply a flow graph, is a directed multigraph in which: (i) the nodes are basic blocks; and (ii) the edges represent flow of control (branches or fall-through execution).

§  In a CFG we have no information about the data. Therefore an edge in the CFG means that the program may take that path.

§  The basic block whose leader is the first intermediate language statement is called the start node.

52


There is a directed edge from basic block B1 to basic

block B2 in the CFG if:

(1) There is a branch from the last statement of B1 to the first statement of B2, OR (2) Control flow can fall through from B1 to B2 because:

(i) B2 immediately follows B1, and (ii) B1 does not end with an unconditional

branch. 53

Example: Control Flow Graph Formation

(1) prod := 0 (2) i := 1


(13) …

B1

B2

B3

B1

B2

B3

54

Example: Control Flow Graph Formation

(1) prod := 0 (2) i := 1


(13) …

B1

B2

B3

Rule (2)

B1

B2

B3

55

Example : Control Flow Graph Formation

(1) prod := 0 (2) i := 1


(13) …

B1

B2

B3

Rule (2) Rule (1)

B1

B2

B3

56

Example : Control Flow Graph Formation

(1) prod := 0 (2) i := 1


(13) …

B1

B2

B3

Rule (2)

Rule (2)

B1

B2

B3

Rule (1)

57


n  Control Flow Graph: Directed graph, G = (V,E) where each vertex V is a basic block and there is an edge E, v1 (BB1) à v2 (BB2) if BB2 can immediately follow BB1 in some execution sequence

n  Basic block – a sequence of consecutive operations in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end n  A BB has an edge to all blocks it can branch

to n  Standard representation used by many

compilers n  Often have 2 pseudo vertices

n  entry node n  exit node

BB1

BB2

BB4

BB3

BB5 BB6

BB7

Entry

Exit

58


int contrived(int *p, int *w, int x){ int *q; if(x) { kfree(w); q = p; } if(!x) return *w; return *q; } int contrived_caller (int *w, int x, int *p) { kfree (p); contrived (p, w, x); return *w; }

B1 entry to contrived_caller

B2 kfree(p);

B3 contrived(p,w,x);

B5 entry to contrived

B6 int *q; if(x)

B7 kfree(w); q=p;

B8 if(!x)

B9 return *w;

B11 exit from contrived

B3’ return *w;

B4 exit from contrived_caller

B10 return *q;

59

Intraprocedural Analysis-DFS

n  The Depth-First Search (DFS) algorithm is used by xgcc to traverse the CFG starting at the entry block.

n  Single control path is followed until the end of the function

n  The traversal then backtracks to the last branch point

n  The extension state is recorded at each block

n  If the block is traversed again, the traversal is aborted and backtracks to the last branch point

B1

B2

B5 B3

B6 B4

B7

60

Block Summary

n  The extensions are applied to each basic block to check for security rules violations

n  Each basic block has a block summary that records: n  All extensions states that reach that block n  All SM transitions executed during the block analysis

n  Transition edges : (s,v : t → vs) ==> (s’,v : t → v’s) example transition edge of the Free Checker: (start, v: p → freed ) ==> (start, v: p → stop )

n  Add edges : (s,v : t → unknown) ==> (s’,v : t → v’s) example add edge of the Free Checker: (start, v: p → unknown ) ==> (start, v: p → freed )

n  Block summaries take advantage of the determinism property of metal extensions: n  Applying a metal extension at the same program point with the

same state always produces the same result. 61

Source code example 1: int contrived(int *p, int *w, int x) { 2: int *q; 3: if(x) { 4: kfree(w); 5: q = p; 6: 7: } 8: if(!x) 9: return *w; // safe 10: return *q; // using 'q' after free! 11: } 12: int contrived_caller (int *w,int x,int *p){ 13: kfree (p); 14: contrived (p, w, x); 15: return *w; // using 'w' after free! 16:}

62

B1: (start, <>) → (start,<>)

B1: (start, <>) → (start,<>)

Source code example 1: int contrived(int *p, int *w, int x) { 2: int *q; 3: if(x) { 4: kfree(w); 5: q = p; 6: 7: } 8: if(!x) 9: return *w; // safe 10: return *q; // using 'q' after free! 11: } 12: int contrived_caller (int *w, int x, int *p) { 13: kfree (p); 14: contrived (p, w, x); 15: return *w; // using 'w' after free! 16:}

63

B2: (start,v: p → unknown) → (start,v: p → freed)


64

B3: (start,v: p → freed) → (start,v: p → freed)


65



66



67

B7: (start,v: w → unknown) → (start,v: w → freed) (start,v: q → unknown) → (start,v: q → freed) (start,v: p → freed) → (start,v: p → freed)

Source code example

68

1: int contrived(int *p, int *w, int x) { 2: int *q; 3: if(x) { 4: kfree(w); 5: q = p; 6: 7: } 8: if(!x) 9: return *w; // safe 10: return *q; // using 'q' after free! 11: } 12: int contrived_caller (int *w, int x, int *p) { 13: kfree (p); 14: contrived (p, w, x); 15: return *w; // using 'w' after free! 16:}

B8: (start,v: w → freed) → (start,v: w → freed) (start,v: q → freed) → (start,v: q → freed) (start,v: p → freed) → (start,v: p → freed)

Source code example

69

1: int contrived(int *p, int *w, int x) { 2: int *q; 3: if(x) { 4: kfree(w); 5: q = p; 6: 7: } 8: if(!x) 9: return *w; // safe 10: return *q; // using 'q' after free! 11: } 12: int contrived_caller (int *w, int x, int *p) { 13: kfree (p); 14: contrived (p, w, x); 15: return *w; // using 'w' after free! 16:}


Source code example

70

1: int contrived(int *p, int *w, int x) { 2: int *q; 3: if(x) { 4: kfree(w); 5: q = p; 6: 7: } 8: if(!x) 9: return *w; // safe 10: return *q; 11: } 12: int contrived_caller (int *w, int x, int *p) { 13: kfree (p); 14: contrived (p, w, x); 15: return *w; // using 'w' after free! 16:}

B10: (start,v: w → freed) → (start,v: w → freed) (start,v: q → freed) → (start,v: q → stop) (start,v: p → freed) → (start,v: p → freed)

Source code example

71

1: int contrived(int *p, int *w, int x) { 2: int *q; 3: if(x) { 4: kfree(w); 5: q = p; 6: 7: } 8: if(!x) 9: return *w; // safe 10: return *q; 11: } // Exit from Contrived 12: int contrived_caller (int *w, int x, int *p) { 13: kfree (p); 14: contrived (p, w, x); 15: return *w; // using 'w' after free! 16:}

B11: (start,v: w → freed) → (start,v: w → freed) (start,v: p → freed) → (start,v: p → freed)

Source code example 1: int contrived(int *p, int *w, int x) { 2: int *q; 3: if(x) { 4: kfree(w); 5: q = p; 6: 7: } 8: if(!x) 9: return *w; // safe 10: return *q; // using 'q' after free! 11: } 12: int contrived_caller (int *w, int x, int *p) { 13: kfree (p); 14: contrived (p, w, x); 15: return *w; 16:}

72

B3’: (start,v: p → freed) → (start,v: p → freed) (start,v: w → freed) → (start,v: w → stop)

Source code example 1: int contrived(int *p, int *w, int x) { 2: int *q; 3: if(x) { 4: kfree(w); 5: q = p; 6: 7: } 8: if(!x) 9: return *w; // safe 10: return *q; // using 'q' after free! 11: } 12: int contrived_caller (int *w, int x, int *p) { 13: kfree (p); 14: contrived (p, w, x); 15: return *w; // using 'q' after free! 16:} //Exit from contrived_caller

73


Source code example 1: int contrived(int *p, int *w, int x) { 2: int *q; 3: if(x) { 4: kfree(w); 5: q = p; 6: } 7: if(!x) 8: return *w; // safe 9: return *q; // using 'q' after free! 10: } 11: int contrived_caller (int *w, int x, int *p) { 12: kfree (p); 13: contrived (p, w, x); 14: return *w; // using 'w' after free! 15:}

74

Agenda



n  Conclusion

75

Unsound approach

n  Metal extensions and xgcc interprocedural are unsound n  Some security violations can remain undetected

n  MC focuses on executing metal extensions effectively to find as much security bugs as possible. n  Metal can easily express violations of known

correctness rules n  Metal automatically infers such rules from source

code

76

Conclusion

n  MC has been used to detect over 100 security holes in Linux and BSD

n  MC is now a commercial tool n  www.coverity.com

77

inse 6150 security evaluation methodologiesclark/courses/1401-6150/...inse 6150 security evaluation...

Documents