cmpe 511 computer architecture a faster optimal register allocator betül demiröz

Post on 01-Jan-2016

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CMPE 511 Computer Architecture

A Faster Optimal Register Allocator

Betül Demiröz

8 December 2005

2

Outline

Motivation of the StudyRegister Allocation ProblemClassical Methods (Chaitin & Briggs)Optimal Register AllocatorExperimental Study

8 December 2005

3

Motivation of the StudyChallenges of Compilers for Embedded Systems

Power consumption, memory space limitationsSmall set of applications

Afford long execution cycles to generate good code quality for various phases

instruction selectioninstruction schedulingregister allocation

8 December 2005

4

Motivation of the Study (2)

Instruction Selectionselecting target machine instructions to implement pirimitive IR (Instruction Representation) code instructionschanges quality of the code

Instruction Schedulingordering the operations in the compiled codedecreases the running time of the compiler

8 December 2005

5

Register Allocation

Problemassigning program variables into available registersshape runtime performance of a compiled code

Failure to provide an efficient register allocation

increase in the number of memory accessesincrease in code size (effect memory capacity and overall form factor of the device)increase in power consumption (frequent memory visits due to poor register allocation)

8 December 2005

6

Register Allocation (2)

NP-Complete (Garey & Johnson, 1976)Approaches

Graph ColoringChaitin (1981)

Integer ProgrammingGoodwin and Wilken (1996)

8 December 2005

7

Graph ColoringTraditional solution to register allocation problem.Graphs are used to show registersEach node represents a register, and an edge connecting these nodes shows that these registers are alive at the same point in the programSuch nodes should be colored with different colors

8 December 2005

8

Graph Coloring (2)

Spilling (lack of registers variables stored in memory for some or all of its lifetime)Spill cost (runtime cost of a variable for loading from and storing in memory)

address computation, memory operation, execution frequency

8 December 2005

9

Live RangesA variable Vi is live at a point p in program if

defined above p & not used yet for the last time.

Live Range (LRi )begins with the definition of Vi ends with its last use of Vi

LRi & LRj simultaneously live at p LRi interferes LRj

Not stored in the same register.Interference Graph Gı = G(V,E)

V = set of individual live ranges E = set of edges that represent interferences

8 December 2005

10

int main(){ int a; int b; int i; a=10; b=1; i=0; while (i<=a){

b+=b*i; i++; if (b>=100) break;

} return 0;

}

main:pushl %ebpmovl %esp, %ebpsubl $24, %espandl $-16, %espmovl $0, %eaxsubl %eax, %espmovl $10, -4(%ebp)movl $1, -8(%ebp)movl $0, -12(%ebp)

.L2:movl -12(%ebp),

%eaxcmpl -4(%ebp), %eaxjle .L4jmp .L3

.....

Source CodeGaS (GNU

Assembler)

8 December 2005

11

main: subl $4, t1 (t2) movl t3, t2 movl t2, t3 (t4) subl $24, t2 (t5) andl $-16, t5 (t6) movl $0, t7 subl t7, t6 (t8) movl $10, t4 movl $1, t4

movl $0, t4 .L2:

movl t4, t7 (t9) cmpl t4, t9.....

Extended Representation

Interference Graph

t8t9

t11

t12

t10

t3

t13

t7

t6

t5 t2

t1

t14

t15

t4

8 December 2005

12

Classical Methods for Register Allocation

Register allocator based on Graph Coloring

Chaitin’s Heuristic (limitations for diamond graphs)Optimistic Coloring Heuristic (Briggs)

Stack-Based Methods

8 December 2005

13

Chaitin’s HeuristicInitialize stack S to empty.while(GI ) do

while v of G1 such that v0 < k

Pick any vertex v such that v0 < kRemove v and its edges from G1 and put v on S.

if (GI ) then

Pick a vertex v based on the given Spill MetricSpill the live range associated with v.

Remove v and its edges from GI

while(S ) dov = pop(S)Color v with the lowest color not used by any neighbor of v.

8 December 2005

14

Chaitin-Briggs Heuristic (OCH)Initialize stack S to empty.

while(GI ) do

while v of G1 such that v0 < k Pick any vertex v such that v0 < kRemove v and its edges from G1 and put v on S.

if (GI ) then

Pick a vertex v based on the given Spill MetricPush v on the stack

Remove v and its edges from GIwhile(S ) do

v = pop(S)Color v with the lowest color not used by any neighbor of v.If node υ cannot be colored, then pick an uncolored node υ to spill, spill it, and restart at step 1

8 December 2005

15

Comparison of Chaitin’s Heuristic and OCH

Try to find 2 colorings

A

B

C

D

Chaitin (A spilled, B->r1, C->r2, D->r1)

OCH(A->r1, B->r2, C->r1, D->r2)

8 December 2005

16

Integer Programming (IP)

Compared with graph coloring, IPincreases program performancereduces code size

The time to solve a register allocation problem can be significantThe IP formulation should be as simple as possible

8 December 2005

17

Optimal Register Allocator (ORA)

ORA uses IP to solve register allocation problemProposed by Goodwin and Wilkonson (1996)IP model is very complex, because it contains many redundanciesSolution of the problem is slow

8 December 2005

18

A Faster Optimal Register Allocator

“A Faster Optimal Register Allocator” uses IP to solve register allocation problemFu, Wilken and Goodwin (2005)The proposed approach uses global and local analysis techniques to identify locations where spill and deallocation decisions are unnecessaryUses a simplified IP formulation Faster

8 December 2005

19

Basic ORA Model

8 December 2005

20

Control Flow Graph and ORA Graphs

8 December 2005

21

Basic ORA Model

Models register allocation as a set of network graphs

Symbolic-register graphsMemory graphs

An optimal allocation solution is obtained by selecting a set of graph edges whose costs are minimal

Cost = allocation overhead of a decision

8 December 2005

22

IP Formulation

8 December 2005

23

Redundancy

8 December 2005

24

Global Reduction

Eliminates unnecessary load, store and deallocation decisions placed at the diverge and merge edges in the live range graphs80% of the total decisions generated by ORA model

8 December 2005

25

Decision Placement

8 December 2005

26

Diamond Region ReductionsThere are 4 reduction techniques which can eliminate unnecessary load, store and deallocationVoid region coupling

void regioncoupled decisionpaired decision

Symmetric Decision SelectionJump-Edge NullificationAsymmetric Decision Elimination

8 December 2005

27

Local Reduction

Examines symbolic registers used in adjacent instructions to identify unnecessary load and deallocation decisions

8 December 2005

28

Constraint Reduction

Deallocation constraintsMust-allocate constraintSingle-symbolic constraintLiveness constraint

8 December 2005

29

Deallocation Constraints

Used to allow a real register to be deallocated from a symbolic register at the deallocation decision locationXr

sp-1>= Xrsp

Xrsp-1 represents the allocation state of

real register r to symbolic register s before the deallocation constraint pXr

sp represents the allocation state after p

8 December 2005

30

Must-allocate Constraint

Used to ensure a symbolic register must be allocated to a real register at each definition and each useΣ Xr

sp >=1For optimal allocation, if no deallocation exists between two must-allocate constraints for a symbolic register, then the second must-allocate constraint is redundant

8 December 2005

31

Single-symbolic Constraint

Used to ensure a real register can be allocated to at most one symbolic registerΣ Xr

sp <=1For optimal allocation, if no deallocation exists between two adjacant single-symbolic constraints for a real register, then the first must-allocate constraint is redundant

8 December 2005

32

Liveness constraint

Used to ensure the liveness of a symbolic register Σ Xr

sp + Xmemsp >=1

Xmemsp represents the allocation

state of a symbolic register s to memory at the liveness constraint location p

8 December 2005

33

Experimental Study

Compares graph coloring, ORA and faster ORAFor ORA and faster ORA, SPEC CPU2000 and SPEC CPU92 integer benchmark suites are used with a RISC processor

8 December 2005

34

SPEC CPU92 Benchmark Functions

8 December 2005

35

# decision variables and constraints produced by basic ORA and Faster

ORA

8 December 2005

36

Dynamic spill-code saved using Faster ORA

8 December 2005

37

Dynamic spill code components for SPEC CPU 2000

8 December 2005

38

ConclusionTwo different solutions to register allocation problem

Integer ProgrammingGraph Coloring

The formulations and usages of these solutions are shownFaster ORA reduces the number of register allocation IP decision variables compared to the basic IP formulations IP gives better results as compared to graph coloring

8 December 2005

39

ReferencesG. Chatin and M. Auslender, “Register allocation via coloring,” Computer Languages, 1981D. Goodwin and K. Wilken, “Optimal and near-optimal global register allocation using 0-1 integer programming,” Software Practice and Experience, 1996 C. Fu, K. Wilken and D. Goodwin, “A Faster Optimal Register Allocator,” Journal of Instruction-Level Parallelism 7, 2005

8 December 2005

40

Thank You

ANY QUESTIONS??

top related