cmpe 511 computer architecture a faster optimal register allocator betül demiröz
TRANSCRIPT
CMPE 511 Computer Architecture
A Faster Optimal Register Allocator
Betül Demiröz
8 December 2005
2
Outline
Motivation of the StudyRegister Allocation ProblemClassical Methods (Chaitin & Briggs)Optimal Register AllocatorExperimental Study
8 December 2005
3
Motivation of the StudyChallenges of Compilers for Embedded Systems
Power consumption, memory space limitationsSmall set of applications
Afford long execution cycles to generate good code quality for various phases
instruction selectioninstruction schedulingregister allocation
8 December 2005
4
Motivation of the Study (2)
Instruction Selectionselecting target machine instructions to implement pirimitive IR (Instruction Representation) code instructionschanges quality of the code
Instruction Schedulingordering the operations in the compiled codedecreases the running time of the compiler
8 December 2005
5
Register Allocation
Problemassigning program variables into available registersshape runtime performance of a compiled code
Failure to provide an efficient register allocation
increase in the number of memory accessesincrease in code size (effect memory capacity and overall form factor of the device)increase in power consumption (frequent memory visits due to poor register allocation)
8 December 2005
6
Register Allocation (2)
NP-Complete (Garey & Johnson, 1976)Approaches
Graph ColoringChaitin (1981)
Integer ProgrammingGoodwin and Wilken (1996)
8 December 2005
7
Graph ColoringTraditional solution to register allocation problem.Graphs are used to show registersEach node represents a register, and an edge connecting these nodes shows that these registers are alive at the same point in the programSuch nodes should be colored with different colors
8 December 2005
8
Graph Coloring (2)
Spilling (lack of registers variables stored in memory for some or all of its lifetime)Spill cost (runtime cost of a variable for loading from and storing in memory)
address computation, memory operation, execution frequency
8 December 2005
9
Live RangesA variable Vi is live at a point p in program if
defined above p & not used yet for the last time.
Live Range (LRi )begins with the definition of Vi ends with its last use of Vi
LRi & LRj simultaneously live at p LRi interferes LRj
Not stored in the same register.Interference Graph Gı = G(V,E)
V = set of individual live ranges E = set of edges that represent interferences
8 December 2005
10
int main(){ int a; int b; int i; a=10; b=1; i=0; while (i<=a){
b+=b*i; i++; if (b>=100) break;
} return 0;
}
main:pushl %ebpmovl %esp, %ebpsubl $24, %espandl $-16, %espmovl $0, %eaxsubl %eax, %espmovl $10, -4(%ebp)movl $1, -8(%ebp)movl $0, -12(%ebp)
.L2:movl -12(%ebp),
%eaxcmpl -4(%ebp), %eaxjle .L4jmp .L3
.....
Source CodeGaS (GNU
Assembler)
8 December 2005
11
main: subl $4, t1 (t2) movl t3, t2 movl t2, t3 (t4) subl $24, t2 (t5) andl $-16, t5 (t6) movl $0, t7 subl t7, t6 (t8) movl $10, t4 movl $1, t4
movl $0, t4 .L2:
movl t4, t7 (t9) cmpl t4, t9.....
Extended Representation
Interference Graph
t8t9
t11
t12
t10
t3
t13
t7
t6
t5 t2
t1
t14
t15
t4
8 December 2005
12
Classical Methods for Register Allocation
Register allocator based on Graph Coloring
Chaitin’s Heuristic (limitations for diamond graphs)Optimistic Coloring Heuristic (Briggs)
Stack-Based Methods
8 December 2005
13
Chaitin’s HeuristicInitialize stack S to empty.while(GI ) do
while v of G1 such that v0 < k
Pick any vertex v such that v0 < kRemove v and its edges from G1 and put v on S.
if (GI ) then
Pick a vertex v based on the given Spill MetricSpill the live range associated with v.
Remove v and its edges from GI
while(S ) dov = pop(S)Color v with the lowest color not used by any neighbor of v.
8 December 2005
14
Chaitin-Briggs Heuristic (OCH)Initialize stack S to empty.
while(GI ) do
while v of G1 such that v0 < k Pick any vertex v such that v0 < kRemove v and its edges from G1 and put v on S.
if (GI ) then
Pick a vertex v based on the given Spill MetricPush v on the stack
Remove v and its edges from GIwhile(S ) do
v = pop(S)Color v with the lowest color not used by any neighbor of v.If node υ cannot be colored, then pick an uncolored node υ to spill, spill it, and restart at step 1
8 December 2005
15
Comparison of Chaitin’s Heuristic and OCH
Try to find 2 colorings
A
B
C
D
Chaitin (A spilled, B->r1, C->r2, D->r1)
OCH(A->r1, B->r2, C->r1, D->r2)
8 December 2005
16
Integer Programming (IP)
Compared with graph coloring, IPincreases program performancereduces code size
The time to solve a register allocation problem can be significantThe IP formulation should be as simple as possible
8 December 2005
17
Optimal Register Allocator (ORA)
ORA uses IP to solve register allocation problemProposed by Goodwin and Wilkonson (1996)IP model is very complex, because it contains many redundanciesSolution of the problem is slow
8 December 2005
18
A Faster Optimal Register Allocator
“A Faster Optimal Register Allocator” uses IP to solve register allocation problemFu, Wilken and Goodwin (2005)The proposed approach uses global and local analysis techniques to identify locations where spill and deallocation decisions are unnecessaryUses a simplified IP formulation Faster
8 December 2005
19
Basic ORA Model
8 December 2005
20
Control Flow Graph and ORA Graphs
8 December 2005
21
Basic ORA Model
Models register allocation as a set of network graphs
Symbolic-register graphsMemory graphs
An optimal allocation solution is obtained by selecting a set of graph edges whose costs are minimal
Cost = allocation overhead of a decision
8 December 2005
22
IP Formulation
8 December 2005
23
Redundancy
8 December 2005
24
Global Reduction
Eliminates unnecessary load, store and deallocation decisions placed at the diverge and merge edges in the live range graphs80% of the total decisions generated by ORA model
8 December 2005
25
Decision Placement
8 December 2005
26
Diamond Region ReductionsThere are 4 reduction techniques which can eliminate unnecessary load, store and deallocationVoid region coupling
void regioncoupled decisionpaired decision
Symmetric Decision SelectionJump-Edge NullificationAsymmetric Decision Elimination
8 December 2005
27
Local Reduction
Examines symbolic registers used in adjacent instructions to identify unnecessary load and deallocation decisions
8 December 2005
28
Constraint Reduction
Deallocation constraintsMust-allocate constraintSingle-symbolic constraintLiveness constraint
8 December 2005
29
Deallocation Constraints
Used to allow a real register to be deallocated from a symbolic register at the deallocation decision locationXr
sp-1>= Xrsp
Xrsp-1 represents the allocation state of
real register r to symbolic register s before the deallocation constraint pXr
sp represents the allocation state after p
8 December 2005
30
Must-allocate Constraint
Used to ensure a symbolic register must be allocated to a real register at each definition and each useΣ Xr
sp >=1For optimal allocation, if no deallocation exists between two must-allocate constraints for a symbolic register, then the second must-allocate constraint is redundant
8 December 2005
31
Single-symbolic Constraint
Used to ensure a real register can be allocated to at most one symbolic registerΣ Xr
sp <=1For optimal allocation, if no deallocation exists between two adjacant single-symbolic constraints for a real register, then the first must-allocate constraint is redundant
8 December 2005
32
Liveness constraint
Used to ensure the liveness of a symbolic register Σ Xr
sp + Xmemsp >=1
Xmemsp represents the allocation
state of a symbolic register s to memory at the liveness constraint location p
8 December 2005
33
Experimental Study
Compares graph coloring, ORA and faster ORAFor ORA and faster ORA, SPEC CPU2000 and SPEC CPU92 integer benchmark suites are used with a RISC processor
8 December 2005
34
SPEC CPU92 Benchmark Functions
8 December 2005
35
# decision variables and constraints produced by basic ORA and Faster
ORA
8 December 2005
36
Dynamic spill-code saved using Faster ORA
8 December 2005
37
Dynamic spill code components for SPEC CPU 2000
8 December 2005
38
ConclusionTwo different solutions to register allocation problem
Integer ProgrammingGraph Coloring
The formulations and usages of these solutions are shownFaster ORA reduces the number of register allocation IP decision variables compared to the basic IP formulations IP gives better results as compared to graph coloring
8 December 2005
39
ReferencesG. Chatin and M. Auslender, “Register allocation via coloring,” Computer Languages, 1981D. Goodwin and K. Wilken, “Optimal and near-optimal global register allocation using 0-1 integer programming,” Software Practice and Experience, 1996 C. Fu, K. Wilken and D. Goodwin, “A Faster Optimal Register Allocator,” Journal of Instruction-Level Parallelism 7, 2005
8 December 2005
40
Thank You
ANY QUESTIONS??