data flow analysis compiler baojian hua bjhua@ustc.edu.cn

Post on 28-Dec-2015

245 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Flow Analysis

CompilerBaojian Hua

bjhua@ustc.edu.cn

Front End

source code

abstract syntax

tree

lexical analyzer

parser

tokens

IRsemantic analyzer

Middle End

AST translation IR1

asmother IR

and translation

translation IR2

Optimizations

AST translation IR1

asmother IR

and translation

translation IR2

opt

optopt

opt

opt

General Scheme for Optimization Analysis

control flow, data flow, dependency, …

to obtain conservative static knowledge of the program being optimized

approximation of the dynamic Rewriting

rewrite the program dependent on the knowledge

obtained above

IR

IR’

staticinformation

analysis

rewriting

“Conservative Static”

Cjump (x==5? L1: L2)

y = 1 y = 2

print (y)

Can we substitute y with the value 2?

This amounts to prove that x is always equal to 5!

Suppose x is an input from user, it’s impossible to know it’s value statically. So one must be conservative to use the static knowledge.

Liveness Analysis

Motivation Low level IRs assume an infinite number of a

bstract “registers” good for code generations but bad for execution on a real machine

machine has a finite number of registers so how to leverage this?

The goal of register allocation (optimization) is to put infinite variables into a few registers need liveness analysis

Example

Consider this TAC: Three variables: a, b, and c.

And assume that the target machine has only one register: r.

Is it possible to put all three variables “a”, “b” and “c” in register “r”?

a = 1

b = a + 2

c = b + 3

return c

Example

Calculate which variable is “live” at a given program point.

{c}

{b}

{a}

The “liveness” information gives live ranges.

Live ranges don’t overlap, thus all three variables can be put into one reg’.

Consider this TAC:

a = 1

b = a + 2

c = b + 3

return c

ExampleRegister allocation:

a => r

b => r

c => r

{c}

{b}

{a}

Code rewriting:

r = 1

r = r + 2

r = r + 3

return r

Consider this TAC:

a = 1

b = a + 2

c = b + 3

return c

Data Flow Equations for Liveness Inside basic blocks (backward):in = use[n] \/ (out - def[n])

// Example:a = 1

b = a + 2

c = b + 3

return c

// Example:a = 1

b = a + 2

c = b + 3

return a + c

int

out

For general CFG

Equations: in[n] = use[n]\/(out[n]-def[n]) out[n] = \/s∈succ[n] in[s] Fixpoint algorithm

init in out sets with {} loop until no set changes use[n]

def[n]

in[n]

out[n]

Examplein/out

in/out in/out in/out

in/out

1 {} {}

{} {} {} {a} …

2 {} {}

{a} {}

{a} {b,c} …

3 {} {}

{b,c} {} {b,c}{b} …

4 {} {}

{b} {}

{b}{a,c} …

5 {} {}

{a} {a}

{a}{a,c} …

6 {} {}

{c} {}

{c} {} …

a = 0

b = a + 1

c = c + b

a = b * 2

a<N

return c

1

2

3

4

5

6node 1 2 3 4 5 6

def {a}

{b}

{c} {a} {} {}

use {}

{a}

{b, c}

{b} {a, N}

{c}

{a,c}{b,c}{b,c}{a,c}{a,c}

Final live_out

Loop the nodes with order: 1, 2, 3, 4, 5, 6

{c}

in[n] = use[n] \/ (out[n]-def[n])

out[n] = \/s\in succ[n] in[s]

Interference Graph

a = 0

b = a + 1

c = c + b

a = b * 2

a<N

return c

1

2

3

4

5

6

{a,c}{b,c}{b,c}{a,c}{a,c}

Final live_out

{c}

For any two variable x and y, if they are live simultaneously, then draw an (undirected) edge x->y.

a

b c

Speeding-up the analysis Ordering the nodes

for liveness analysis: reverse top-sort order You do this in lab 5

Once a variable Careful selection of set representation

Careful data structure engineering Say: bit-vector

Basic block You do this in lab 5

Basic Blocks Step 1: calculate def and use for each basic

block b one pass backward calculation

Step 2: do liveness analysis on each block just as discussed above

Step 3: calculate liveness information for each statement in each block one pass backward calculation

Exampleout/in out/in out/in out/in

3 {} {}

{} {c} {} {c} {} {c}

2 {} {}

{c} {a,c} {a,c} {a,c} {a,c} {a,c}

1 {} {}

{a,c} {c} {a,c}{c} {a,c} {c}

a = 0

b = a + 1c = c + ba = b * 2

a<N

return c

1

2

3block 1 2 3

def {a}

{a,b,c} {}

use {} {a,c} {c}

This set does NOT contain variable “b”. Why?

Blocks are reverse topo-sort ordered

live_out for each block

{a,c}

{a,c}

{}

Backward calculation of live_out for each statement.

{a,c}

{b,c}

{b,c}

Reaching Definition

Reaching Definition

a = 0

b = a + 1c = c + ba = b * 2

a<N

return c

1

2

3

E.g., can we substitute the variable a with 0?

The problem: at any program point, we’d like to know where the value of a variable x is defined.

If so, we are doing the so-called constant propagation optimization.

Implementation

a = 0

b = a + 1c = c + ba = b * 2

a<N

return c

1

2

3

Number each definition:

Here we number the four definition with 5, 6, 7, 8, which have no special meaning, just:

1. they are different from the block

number, and

2. they are all unique.)

5:

6:7:8:

Equations

a = 0

b = a + 1c = c + ba = b * 2

a<N

return c

1

2

3

Calculate def and kill for each block, based on the equation

for statement:

def[d: x=…] = {d}

kill[d: x= …] = defs(x)-{d}

5:

6:7:8:

def[1] = {5}

kill[1] = {8}

def[2] = {6,7,8}

kill[2] = {5}def[3] = {}

kill[3] = {}

Data Flow Equation

Forward calculation: in[b] = \/q∈ pred(b) out[b] out[b] = def[b]\/(in[b]-kill[b])

Fixpoint algorithm

a = 0

b = a + 1c = c + ba = b * 2

a<N

return c

1

2

3

5:

6:7:8:

block 1 2 3

def {5}

{6,7,8} {}

kill {8}

{5} {}

in/out in/out in/out in/out

1 {} {}

{} {5} {} {5}

2 {} {}

{5} {6,7,8} {5,6,7,8} {6,7,8}

3 {} {}

{6,7,8} {6,7,8}

{6,7,8} {6,7,8}in[b] = \/q∈ pred(b) out[b]

out[b] = def[b]\/(in[b]-kill[b])

{}

{5,6,7,8}{5,6,7,8}{5,6,7,8}{6,7,8}

{6,7,8}

Constant Propagation

a = 0

b = a + 1c = c + ba = b * 2

a<N

return c

1

2

3

5:

6:7:8:

{}

{5,6,7,8}{5,6,7,8}{5,6,7,8}{6,7,8}

{6,7,8}

Can we substitute the variable a here with the constant “0”?

No! Because there are two definitions for “a” which may reach this point: 5 and 8.

Available Expressions

Available Expressions

a = 0

b = a + 1c = c + ba = a + 1

a<N

return c

1

2

3

E.g., has the right-side expression “a+1” been calculated and thus available here?

So the second calculation can be avoided!

The problem: at a given program point, we’d like to know whether or not the value of an expression e has been calculated and is also available.

1. The expression e must be calculated on every path to the point, and

2. variables used in e must not been redefined after the initial calculation.

Implementation

a = 0

b = a + 1c = c + ba = a + 1

a<N

return c

1

2

3

Calculate gen and kill for each block, based on the equation

for statement. (Tiger table 17.4)

gen[1] = {}

kill[1] = {a+1}

gen[2] = {}

kill[2] = ALL

gen[3] = {}

kill[3] = {}

All possible expressions:

ALL={a+1, c+b}

Implementation

a = 0

b = a + 1c = c + ba = a + 1

a<N

return c

1

2

3

Calculate in/out for each block, based on the fixpoint algorithm.

gen[1] = {}

kill[1] = {a+1}

gen[2] = {}

kill[2] = ALL

gen[3] = {}

kill[3] = {}

All available expressions:

ALL={a+1, c+b}in/out in/out in/out

1 {} ALL {} {}

2 ALL ALL {} {}

3 ALL ALL {} {}

Implementation

a = 0

b = a + 1c = c + ba = a + 1

a<N

return c

1

2

3

Calculate in/out for each statement, based on the in/out for each block.

{}

All available expressions:

ALL={a+1, c+b}in/out in/out in/out

1 {} ALL {} {}

2 ALL ALL {} {}

3 ALL ALL {} {}

{}

{}{a+1}{a+1}{}{}

{}{}

Common Sub-expression Elimination (CSE)

a = 0

b = a + 1c = c + ba = a + 1

a<N

return c

1

2

3

E.g., has the right-side expression “a+1” been calculated and thus available here?

So the second calculation can be avoided!After the available expression

analysis, we know “a+1” is available, so the second calculation can be omitted!

return c

1

2

3

{}

{}

{}{a+1}{a+1}{}{}

{}{}

b

But with which variable the expression “a+1” should be substituted? We need to do reaching expression analysis... (Read the text and do homework!)

top related