Front End
source code
abstract syntax
tree
lexical analyzer
parser
tokens
IRsemantic analyzer
Middle End
AST translation IR1
asmother IR
and translation
translation IR2
Optimizations
AST translation IR1
asmother IR
and translation
translation IR2
opt
optopt
opt
opt
General Scheme for Optimization Analysis
control flow, data flow, dependency, …
to obtain conservative static knowledge of the program being optimized
approximation of the dynamic Rewriting
rewrite the program dependent on the knowledge
obtained above
IR
IR’
staticinformation
analysis
rewriting
“Conservative Static”
Cjump (x==5? L1: L2)
y = 1 y = 2
print (y)
Can we substitute y with the value 2?
This amounts to prove that x is always equal to 5!
Suppose x is an input from user, it’s impossible to know it’s value statically. So one must be conservative to use the static knowledge.
Liveness Analysis
Motivation Low level IRs assume an infinite number of a
bstract “registers” good for code generations but bad for execution on a real machine
machine has a finite number of registers so how to leverage this?
The goal of register allocation (optimization) is to put infinite variables into a few registers need liveness analysis
Example
Consider this TAC: Three variables: a, b, and c.
And assume that the target machine has only one register: r.
Is it possible to put all three variables “a”, “b” and “c” in register “r”?
a = 1
b = a + 2
c = b + 3
return c
Example
Calculate which variable is “live” at a given program point.
{c}
{b}
{a}
The “liveness” information gives live ranges.
Live ranges don’t overlap, thus all three variables can be put into one reg’.
Consider this TAC:
a = 1
b = a + 2
c = b + 3
return c
ExampleRegister allocation:
a => r
b => r
c => r
{c}
{b}
{a}
Code rewriting:
r = 1
r = r + 2
r = r + 3
return r
Consider this TAC:
a = 1
b = a + 2
c = b + 3
return c
Data Flow Equations for Liveness Inside basic blocks (backward):in = use[n] \/ (out - def[n])
// Example:a = 1
b = a + 2
c = b + 3
return c
// Example:a = 1
b = a + 2
c = b + 3
return a + c
int
out
For general CFG
Equations: in[n] = use[n]\/(out[n]-def[n]) out[n] = \/s∈succ[n] in[s] Fixpoint algorithm
init in out sets with {} loop until no set changes use[n]
def[n]
in[n]
out[n]
Examplein/out
in/out in/out in/out
in/out
1 {} {}
{} {} {} {a} …
2 {} {}
{a} {}
{a} {b,c} …
3 {} {}
{b,c} {} {b,c}{b} …
4 {} {}
{b} {}
{b}{a,c} …
5 {} {}
{a} {a}
{a}{a,c} …
6 {} {}
{c} {}
{c} {} …
a = 0
b = a + 1
c = c + b
a = b * 2
a<N
return c
1
2
3
4
5
6node 1 2 3 4 5 6
def {a}
{b}
{c} {a} {} {}
use {}
{a}
{b, c}
{b} {a, N}
{c}
{a,c}{b,c}{b,c}{a,c}{a,c}
Final live_out
Loop the nodes with order: 1, 2, 3, 4, 5, 6
{c}
in[n] = use[n] \/ (out[n]-def[n])
out[n] = \/s\in succ[n] in[s]
Interference Graph
a = 0
b = a + 1
c = c + b
a = b * 2
a<N
return c
1
2
3
4
5
6
{a,c}{b,c}{b,c}{a,c}{a,c}
Final live_out
{c}
For any two variable x and y, if they are live simultaneously, then draw an (undirected) edge x->y.
a
b c
Speeding-up the analysis Ordering the nodes
for liveness analysis: reverse top-sort order You do this in lab 5
Once a variable Careful selection of set representation
Careful data structure engineering Say: bit-vector
Basic block You do this in lab 5
Basic Blocks Step 1: calculate def and use for each basic
block b one pass backward calculation
Step 2: do liveness analysis on each block just as discussed above
Step 3: calculate liveness information for each statement in each block one pass backward calculation
Exampleout/in out/in out/in out/in
3 {} {}
{} {c} {} {c} {} {c}
2 {} {}
{c} {a,c} {a,c} {a,c} {a,c} {a,c}
1 {} {}
{a,c} {c} {a,c}{c} {a,c} {c}
a = 0
b = a + 1c = c + ba = b * 2
a<N
return c
1
2
3block 1 2 3
def {a}
{a,b,c} {}
use {} {a,c} {c}
This set does NOT contain variable “b”. Why?
Blocks are reverse topo-sort ordered
live_out for each block
{a,c}
{a,c}
{}
Backward calculation of live_out for each statement.
{a,c}
{b,c}
{b,c}
Reaching Definition
Reaching Definition
a = 0
b = a + 1c = c + ba = b * 2
a<N
return c
1
2
3
E.g., can we substitute the variable a with 0?
The problem: at any program point, we’d like to know where the value of a variable x is defined.
If so, we are doing the so-called constant propagation optimization.
Implementation
a = 0
b = a + 1c = c + ba = b * 2
a<N
return c
1
2
3
Number each definition:
Here we number the four definition with 5, 6, 7, 8, which have no special meaning, just:
1. they are different from the block
number, and
2. they are all unique.)
5:
6:7:8:
Equations
a = 0
b = a + 1c = c + ba = b * 2
a<N
return c
1
2
3
Calculate def and kill for each block, based on the equation
for statement:
def[d: x=…] = {d}
kill[d: x= …] = defs(x)-{d}
5:
6:7:8:
def[1] = {5}
kill[1] = {8}
def[2] = {6,7,8}
kill[2] = {5}def[3] = {}
kill[3] = {}
Data Flow Equation
Forward calculation: in[b] = \/q∈ pred(b) out[b] out[b] = def[b]\/(in[b]-kill[b])
Fixpoint algorithm
a = 0
b = a + 1c = c + ba = b * 2
a<N
return c
1
2
3
5:
6:7:8:
block 1 2 3
def {5}
{6,7,8} {}
kill {8}
{5} {}
in/out in/out in/out in/out
1 {} {}
{} {5} {} {5}
2 {} {}
{5} {6,7,8} {5,6,7,8} {6,7,8}
3 {} {}
{6,7,8} {6,7,8}
{6,7,8} {6,7,8}in[b] = \/q∈ pred(b) out[b]
out[b] = def[b]\/(in[b]-kill[b])
{}
{5,6,7,8}{5,6,7,8}{5,6,7,8}{6,7,8}
{6,7,8}
Constant Propagation
a = 0
b = a + 1c = c + ba = b * 2
a<N
return c
1
2
3
5:
6:7:8:
{}
{5,6,7,8}{5,6,7,8}{5,6,7,8}{6,7,8}
{6,7,8}
Can we substitute the variable a here with the constant “0”?
No! Because there are two definitions for “a” which may reach this point: 5 and 8.
Available Expressions
Available Expressions
a = 0
b = a + 1c = c + ba = a + 1
a<N
return c
1
2
3
E.g., has the right-side expression “a+1” been calculated and thus available here?
So the second calculation can be avoided!
The problem: at a given program point, we’d like to know whether or not the value of an expression e has been calculated and is also available.
1. The expression e must be calculated on every path to the point, and
2. variables used in e must not been redefined after the initial calculation.
Implementation
a = 0
b = a + 1c = c + ba = a + 1
a<N
return c
1
2
3
Calculate gen and kill for each block, based on the equation
for statement. (Tiger table 17.4)
gen[1] = {}
kill[1] = {a+1}
gen[2] = {}
kill[2] = ALL
gen[3] = {}
kill[3] = {}
All possible expressions:
ALL={a+1, c+b}
Implementation
a = 0
b = a + 1c = c + ba = a + 1
a<N
return c
1
2
3
Calculate in/out for each block, based on the fixpoint algorithm.
gen[1] = {}
kill[1] = {a+1}
gen[2] = {}
kill[2] = ALL
gen[3] = {}
kill[3] = {}
All available expressions:
ALL={a+1, c+b}in/out in/out in/out
1 {} ALL {} {}
2 ALL ALL {} {}
3 ALL ALL {} {}
Implementation
a = 0
b = a + 1c = c + ba = a + 1
a<N
return c
1
2
3
Calculate in/out for each statement, based on the in/out for each block.
{}
All available expressions:
ALL={a+1, c+b}in/out in/out in/out
1 {} ALL {} {}
2 ALL ALL {} {}
3 ALL ALL {} {}
{}
{}{a+1}{a+1}{}{}
{}{}
Common Sub-expression Elimination (CSE)
a = 0
b = a + 1c = c + ba = a + 1
a<N
return c
1
2
3
E.g., has the right-side expression “a+1” been calculated and thus available here?
So the second calculation can be avoided!After the available expression
analysis, we know “a+1” is available, so the second calculation can be omitted!
return c
1
2
3
{}
{}
{}{a+1}{a+1}{}{}
{}{}
b
But with which variable the expression “a+1” should be substituted? We need to do reaching expression analysis... (Read the text and do homework!)