static analysis co444h dataflow dataflow frameworkslivshits/classes/co444h/slides/lec2.pdf ·...

57
Static analysis Dataflow Dataflow frameworks CO444H Ben Livshits

Upload: others

Post on 24-Jul-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Static analysis

Dataflow

Dataflow frameworks

CO444H

Ben Livshits

Page 2: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Master’s Projects Available

1. Crashes to exploits

2. Pointer analysis for JavaScript

3. Private data management languages

4. Programming robots to assemble IKEA furniture

5. Project in software security

6. Security vulnerabilities in web browsers

7. Toward auditable financial software

8. User tracking in mobile browsers

2

Page 3: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

3

We are in the Idealized World of CFGs

t = x+ya = t

t = x+yb = t

c = t

t = x+ya = t

b = t

c = t

t = x+yb = t

Page 4: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Data Flow Equations

4

Page 5: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Dataflow Analysis

• Computes facts about values in the program

• Little or no interaction between facts

• Based on all paths through program

• Including, sometimes, infeasible paths

• Let’s consider some dataflow analyses…

5

Page 6: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Some Static Analysis Goals

• For example• What can values can integer x have?

• What locations can pointer p point to?

• Can double y be negative?

• Can it assume value 17?

• etc.

• This is static reasoning – we are approximating runtime execution here

6

Page 7: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Static vs. Runtime

i = 1;

while(true){

i = i + 2;

if(…) break;

}

• How can we approximate the possible values of i?

• What can we conclude on the basis of this code?

7

i = 1;

while(i < 1000){

i = i + 2;

a = i*2;

}

• How about now?

Page 8: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Examples of Dataflow Analysis

•We will cover three common types of analysis• Reaching definitions• Available expressions• Live variables

8

Page 9: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Reaching Definitions

9

Page 10: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Reaching Definitions

• We will start this discussion by talking about an analysis called Reaching Definitions…

• A basic block can generate a definition

• A basic block can either• Kill a definition of x if it surely redefines x

• Transmit a definition if it may not redefine the same variable(s) as that definition

10

Page 11: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

11

IN and OUT

The following sets are defined:

• IN(B) = set of definitions reaching the beginning of block B

• OUT(B) = set of definitions reaching the end of B

Page 12: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

12

Equations

Two kinds of equations:

• Confluence equations: IN(B) in terms of OUTs of predecessors of B

• Transfer equations: OUT(B) in terms of IN(B) and what goes on in block B

Page 13: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

13

Confluence Equations

IN(B) = ∪predecessors P of B OUT(P)

P2

B

P1

{d1, d2, d3}

{d2, d3}{d1, d2}

Page 14: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

14

Transfer Equations

• Generate a definition in the block if its variable is not definitely rewritten later in the basic block

• Kill a definition if its variable is definitely rewritten in the block

• An internal definition may be both killed and generated

Page 15: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Example: GEN and KILL

15

• For each basic block B1, B2, B3 we can compute GEN and KILL sets independently

• These will be part of the transfer function

Page 16: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

16

Transfer Function for a Block

Connecting IN and OUT sets…

For any block B:

OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B)

Page 17: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

17

Iterative Solution --- (2)

IN(entry) = ∅;

for each block do OUT(B)= ∅;

while (changes occur) do

for each block B do {

IN(B) = ∪predecessors P of B OUT(P);

OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B);

}

Page 18: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

18

Iterative Solution to Equations

• For an n-block flow graph, there are 2*n equations and 2*n unknowns.

• Alas, the solution is not unique.

• Standard theory assumes a field of constants; sets are not a field.

• Use iterative solution to get the least fixedpoint.

• Identifies any def that might reach a point

Page 19: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Reaching Definitions: Algorithm in Action

19

d1: x = 5

if x == 10

d2: x = 15

B1

B3

B2

IN(B1) = {}

OUT(B1) = {

OUT(B2) = {

OUT(B3) = {

d1}

IN(B2) = {d1,

d1,

IN(B3) = {d1,

d2}

d2}

d2}

d2}

Page 20: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

A bit-vector representation for greater computational efficiency

20

Page 21: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Aside: Notice the Conservatism

• Not only the most conservative assumption about when a def is KILLed or GEN’d

• Also the conservative assumption that any path in the flow graph can actually be taken

• Also, this is a may analysis, not a must analysis

21

Page 22: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Available Expressions

22

Page 23: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

23

Another Data-Flow Problem: Available Expressions

• An expression x+y is available at a point if no matter what path has been taken to that point from the entry, x+y has been evaluated, and neither x nor y have even possibly been redefined

• Useful for global common-subexpression elimination

Page 24: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Available expressions example

24

• Watch out for things that are possibly KILLedby an assignment

2010 Stephen Chong, Harvard University

Page 25: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

25

Defining GEN(B) and KILL(B)

• An expression x+y is generated if it is computed in B, and afterwards there is no possibility that either x or y is redefined

• An expression x+y is killed if it is not generated in B and either x or y is possibly redefined

Page 26: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

26

Equations for Available Expressions

• The equations for AE are essentially the same as for RD, with one exception

• Confluence of paths involves intersection of sets of expressions rather than union of sets of definitions

• Available expressions is a forward must analysis• Forward means that data facts flow from IN to OUT

• Must means that join points, only keep facts that hold on all paths that are joined

Page 27: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

27

Example of GEN and KILL for Available Expressions

x = x+y

z = a+b

Generatesa+b

Kills x+y,w*x, etc.

Kills z-w,x+z, etc.

Page 28: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

28

Transfer Equations

• Transfer equation is exactly the same as before:

OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B)

•Which is good – we can use the same template for all GEN/KILL problems

Page 29: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

29

Confluence Equations

• Confluence involves intersection, because an expression is available coming into a block if and only if it is available coming out of each predecessor

IN(B) = ∩predecessors P of B OUT(P)

Page 30: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

30

Iterative Solution

IN(entry) = ∅;

for each block B do OUT(B)= ALL;

while (changes occur) do

for each block B do {

IN(B) = ∩predecessors P of B OUT(P);

OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B);

}

Page 31: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

31

Why It Works

• An expression x+y is unavailable at point p iff there is a path from the entry to p that either:

1. Never evaluates x+y, or

2. Kills x+y after its last evaluation

• IN(entry) = ∅ takes care of #1 above

• OUT(B) = ALL, plus intersection during iteration handles #2 above

Page 32: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

32

Example of Why We Want Intersection

point p

Entry

x+ynevergen’d

x+y killed

x+yneverGEN’d

Page 33: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

33

Subtle Point

• It is conservative to assume an expression isn’t available, even if it is

• But we don’t have to be “insanely conservative”• If after considering all paths, and assuming x+y killed by

any possibility of redefinition, we still can’t find a path explaining its unavailability, then x+y is available

• This is a delicate dance between soundness and precision

Page 34: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

How Would the Algorithm Change for A Backwards Analysis?

34

Page 35: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Live Variables

35

Page 36: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

36

Live Variable Analysis

• Variable x is live at a point p if on some path from p, x is used before it is redefined

• Useful in code generation: if x is not live on exit from a basic block, there is no need to copy x from a register to memory

• Captures if there is a demand for a variable

Page 37: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

37

Equations for Live Variables

• LV is essentially a “backwards” version of RD

• In place of GEN(B): Use(B) = set of variables xpossibly used in B prior to any certain definition of x

• In place of KILL(B): Def(B) = set of variables xcertainly defined before any possible use of x

Page 38: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

38

Transfer Equations

• Transfer equations give IN’s in terms of OUT’s:

IN(B) = (OUT(B) – Def(B)) ∪ Use(B)

• This is a little different – the direction is reversed

Page 39: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

39

Confluence Equations

• Confluence involves union over successors, so a variable is in OUT(B) if it is live on entry to any of B’s successors.

OUT(B) = ∪successors S of B IN(S)

Page 40: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

40

Iterative Solution for Live Variables

OUT(exit) = ∅;

for each block B do IN(B)= ∅;

while (changes occur) do

for each block B do {

OUT(B) = ∪successors S of B IN(S);

IN(B) = (OUT(B) – Def(B)) ∪ Use(B);

}

Page 41: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Data-Flow Frameworks

Lattice-Theoretic Formulation

Meet-Over-Paths Solution

Monotonicity/Distributivity41

Page 42: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Data-Flow Analysis Frameworks

• Generalizes and unifies each of the DFA examples from previous lecture.

• Important ingredients :

42

Element Symbol Explanation

Direction D forward or backward

Domain V (possible values for IN, OUT)

Meet operator ∧ (effect of path confluence)

Transfer functions F (effect of passing through a basic block)

Page 43: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

43

Good News!

•All three analyses above fit the model• RD’s: Forward, meet = union, transfer

functions based on GEN and KILL• AE’s: Forward, meet = intersection,

transfer functions based on GEN and KILL• LV’s: Backward, meet = union, transfer

functions based on USE and DEF

Page 44: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

May vs. Must Analysis

May Must

Forward Reaching definitions

Available expressions

Backward Live variables Very busy expressions

44

Page 45: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

45

Semilattices

We stay that a set V and operation meet (denoted ∧) form a semilattice if for all x, y, and z in V:

1. x ∧ x = x (idempotence)

2. x ∧ y = y ∧ x (commutativity)

3. x ∧ (y ∧ z) = (x ∧ y) ∧ z (associativity )

4. Top element ⊤ such that for all x, ⊤∧ x = x.

5. Bottom element (optional) ⊥ such that for all x: ⊥ ∧ x = ⊥

Page 46: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Available expressions (semi)lattice

46

In this example we have a+b, a+1, a*b as possible computations in this program

Page 47: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

47

Example: Semilattice

• V = power set of some set (like previous example)

•∧ = union

• Union is idempotent, commutative, and associative

• What are the top and bottom elements?

Page 48: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

48

Partial Order for a Semilattice

• Say x ≤ y iff x ∧ y = x

• Also, x < y iff x ≤ y and x ≠ y

• ≤ is really a partial order:1. x ≤ y and y ≤ z imply x ≤ z (proof in the Dragon book)

2. x ≤ y and y ≤ x iff x = y.

Proof:

• x ∧ y = x and y ∧ x = y.

• Thus, x = x ∧ y = y ∧ x = y

Page 49: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

49

Axioms for Transfer Functions

• Transfer function F includes the identity function• Why needed? Constructions often require introduction

of an empty block.

2. F is closed under composition.

• Why needed?• The concatenation of two blocks is a block.

• Transfer function for a block can be constructed from individual statements.

Page 50: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

50

Example: Reaching Definitions

• Direction D = forward.

• Domain V = set of all sets of definitions in the flow graph.

• ∧ = union.

• Functions F = all “gen-kill” functions of the form f(x) = (x - K) ∪ G, where KILL and GEN are sets of definitions (members of V).

Page 51: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

51

Example: Satisfies Axioms

• Union on a power set forms a semilattice(idempotent, commutative, associative).

• Identity function: let K = G = ∅.

• Composition: A little algebra.

Page 52: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

52

Example: Partial Order

• For RD’s, S ≤ T means S ∪ T = S.

• Equivalently S ⊇ T.• Seems “backward,” but that’s what the definitions give

you

• Intuition: ≤ measures “ignorance.”• The more definitions we know about, the less

ignorance we have.

• ⊤ = “total ignorance.”

Page 53: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

53

DFA Frameworks

• (D, V, ∧, F)

• A flow graph, with an associated function fB in F for each block B

• A boundary value vENTRY or vEXIT if D = forward or backward, respectively.

Page 54: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

54

Iterative Algorithm (Forward)

OUT[entry] = vENTRY;

for (other blocks B) OUT[B] = ⊤;

while (changes to any OUT)

for (each block B) {

IN(B) = ∧ predecessors P of B OUT(P);

OUT(B) = fB(IN(B));

}

Page 55: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Iterative Algorithm (Backward)

Almost the same thing – just make a few changes:

1. Swap IN and OUT everywhere2. Replace ENTRY by EXIT

55

Page 56: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

GCC optimizations

56

• Why does gccgenerate 15-20% faster code if I optimize for size instead of speed?

• http://stackoverflow.com/questions/19470873/why-does-gcc-generate-15-20-faster-code-if-i-optimize-for-size-instead-of-speed

Page 57: Static analysis CO444H Dataflow Dataflow frameworkslivshits/classes/CO444H/slides/LEC2.pdf · Dataflow Analysis •Computes facts about values in the program •Little or no interaction

Multiple Processors

57

• By default compilers optimize for "average" processor. Since different processors favordifferent instruction sequences, compiler optimizations enabled by -O2might benefit average processor, but decreaseperformance on your particular processor (and the same applies to -Os).

• If you try the same example on different processors, you will find that on some of them benefit from -O2 while other are more favorable to -Osoptimizations