cmpe 511 computer architecture a faster optimal register allocator betül demiröz

40
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

Upload: linda-howard

Post on 01-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

CMPE 511 Computer Architecture

A Faster Optimal Register Allocator

Betül Demiröz

Page 2: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

2

Outline

Motivation of the StudyRegister Allocation ProblemClassical Methods (Chaitin & Briggs)Optimal Register AllocatorExperimental Study

Page 3: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

3

Motivation of the StudyChallenges of Compilers for Embedded Systems

Power consumption, memory space limitationsSmall set of applications

Afford long execution cycles to generate good code quality for various phases

instruction selectioninstruction schedulingregister allocation

Page 4: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

4

Motivation of the Study (2)

Instruction Selectionselecting target machine instructions to implement pirimitive IR (Instruction Representation) code instructionschanges quality of the code

Instruction Schedulingordering the operations in the compiled codedecreases the running time of the compiler

Page 5: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

5

Register Allocation

Problemassigning program variables into available registersshape runtime performance of a compiled code

Failure to provide an efficient register allocation

increase in the number of memory accessesincrease in code size (effect memory capacity and overall form factor of the device)increase in power consumption (frequent memory visits due to poor register allocation)

Page 6: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

6

Register Allocation (2)

NP-Complete (Garey & Johnson, 1976)Approaches

Graph ColoringChaitin (1981)

Integer ProgrammingGoodwin and Wilken (1996)

Page 7: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

7

Graph ColoringTraditional solution to register allocation problem.Graphs are used to show registersEach node represents a register, and an edge connecting these nodes shows that these registers are alive at the same point in the programSuch nodes should be colored with different colors

Page 8: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

8

Graph Coloring (2)

Spilling (lack of registers variables stored in memory for some or all of its lifetime)Spill cost (runtime cost of a variable for loading from and storing in memory)

address computation, memory operation, execution frequency

Page 9: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

9

Live RangesA variable Vi is live at a point p in program if

defined above p & not used yet for the last time.

Live Range (LRi )begins with the definition of Vi ends with its last use of Vi

LRi & LRj simultaneously live at p LRi interferes LRj

Not stored in the same register.Interference Graph Gı = G(V,E)

V = set of individual live ranges E = set of edges that represent interferences

Page 10: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

10

int main(){ int a; int b; int i; a=10; b=1; i=0; while (i<=a){

b+=b*i; i++; if (b>=100) break;

} return 0;

}

main:pushl %ebpmovl %esp, %ebpsubl $24, %espandl $-16, %espmovl $0, %eaxsubl %eax, %espmovl $10, -4(%ebp)movl $1, -8(%ebp)movl $0, -12(%ebp)

.L2:movl -12(%ebp),

%eaxcmpl -4(%ebp), %eaxjle .L4jmp .L3

.....

Source CodeGaS (GNU

Assembler)

Page 11: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

11

main: subl $4, t1 (t2) movl t3, t2 movl t2, t3 (t4) subl $24, t2 (t5) andl $-16, t5 (t6) movl $0, t7 subl t7, t6 (t8) movl $10, t4 movl $1, t4

movl $0, t4 .L2:

movl t4, t7 (t9) cmpl t4, t9.....

Extended Representation

Interference Graph

t8t9

t11

t12

t10

t3

t13

t7

t6

t5 t2

t1

t14

t15

t4

Page 12: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

12

Classical Methods for Register Allocation

Register allocator based on Graph Coloring

Chaitin’s Heuristic (limitations for diamond graphs)Optimistic Coloring Heuristic (Briggs)

Stack-Based Methods

Page 13: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

13

Chaitin’s HeuristicInitialize stack S to empty.while(GI ) do

while v of G1 such that v0 < k

Pick any vertex v such that v0 < kRemove v and its edges from G1 and put v on S.

if (GI ) then

Pick a vertex v based on the given Spill MetricSpill the live range associated with v.

Remove v and its edges from GI

while(S ) dov = pop(S)Color v with the lowest color not used by any neighbor of v.

Page 14: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

14

Chaitin-Briggs Heuristic (OCH)Initialize stack S to empty.

while(GI ) do

while v of G1 such that v0 < k Pick any vertex v such that v0 < kRemove v and its edges from G1 and put v on S.

if (GI ) then

Pick a vertex v based on the given Spill MetricPush v on the stack

Remove v and its edges from GIwhile(S ) do

v = pop(S)Color v with the lowest color not used by any neighbor of v.If node υ cannot be colored, then pick an uncolored node υ to spill, spill it, and restart at step 1

Page 15: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

15

Comparison of Chaitin’s Heuristic and OCH

Try to find 2 colorings

A

B

C

D

Chaitin (A spilled, B->r1, C->r2, D->r1)

OCH(A->r1, B->r2, C->r1, D->r2)

Page 16: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

16

Integer Programming (IP)

Compared with graph coloring, IPincreases program performancereduces code size

The time to solve a register allocation problem can be significantThe IP formulation should be as simple as possible

Page 17: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

17

Optimal Register Allocator (ORA)

ORA uses IP to solve register allocation problemProposed by Goodwin and Wilkonson (1996)IP model is very complex, because it contains many redundanciesSolution of the problem is slow

Page 18: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

18

A Faster Optimal Register Allocator

“A Faster Optimal Register Allocator” uses IP to solve register allocation problemFu, Wilken and Goodwin (2005)The proposed approach uses global and local analysis techniques to identify locations where spill and deallocation decisions are unnecessaryUses a simplified IP formulation Faster

Page 19: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

19

Basic ORA Model

Page 20: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

20

Control Flow Graph and ORA Graphs

Page 21: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

21

Basic ORA Model

Models register allocation as a set of network graphs

Symbolic-register graphsMemory graphs

An optimal allocation solution is obtained by selecting a set of graph edges whose costs are minimal

Cost = allocation overhead of a decision

Page 22: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

22

IP Formulation

Page 23: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

23

Redundancy

Page 24: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

24

Global Reduction

Eliminates unnecessary load, store and deallocation decisions placed at the diverge and merge edges in the live range graphs80% of the total decisions generated by ORA model

Page 25: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

25

Decision Placement

Page 26: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

26

Diamond Region ReductionsThere are 4 reduction techniques which can eliminate unnecessary load, store and deallocationVoid region coupling

void regioncoupled decisionpaired decision

Symmetric Decision SelectionJump-Edge NullificationAsymmetric Decision Elimination

Page 27: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

27

Local Reduction

Examines symbolic registers used in adjacent instructions to identify unnecessary load and deallocation decisions

Page 28: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

28

Constraint Reduction

Deallocation constraintsMust-allocate constraintSingle-symbolic constraintLiveness constraint

Page 29: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

29

Deallocation Constraints

Used to allow a real register to be deallocated from a symbolic register at the deallocation decision locationXr

sp-1>= Xrsp

Xrsp-1 represents the allocation state of

real register r to symbolic register s before the deallocation constraint pXr

sp represents the allocation state after p

Page 30: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

30

Must-allocate Constraint

Used to ensure a symbolic register must be allocated to a real register at each definition and each useΣ Xr

sp >=1For optimal allocation, if no deallocation exists between two must-allocate constraints for a symbolic register, then the second must-allocate constraint is redundant

Page 31: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

31

Single-symbolic Constraint

Used to ensure a real register can be allocated to at most one symbolic registerΣ Xr

sp <=1For optimal allocation, if no deallocation exists between two adjacant single-symbolic constraints for a real register, then the first must-allocate constraint is redundant

Page 32: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

32

Liveness constraint

Used to ensure the liveness of a symbolic register Σ Xr

sp + Xmemsp >=1

Xmemsp represents the allocation

state of a symbolic register s to memory at the liveness constraint location p

Page 33: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

33

Experimental Study

Compares graph coloring, ORA and faster ORAFor ORA and faster ORA, SPEC CPU2000 and SPEC CPU92 integer benchmark suites are used with a RISC processor

Page 34: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

34

SPEC CPU92 Benchmark Functions

Page 35: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

35

# decision variables and constraints produced by basic ORA and Faster

ORA

Page 36: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

36

Dynamic spill-code saved using Faster ORA

Page 37: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

37

Dynamic spill code components for SPEC CPU 2000

Page 38: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

38

ConclusionTwo different solutions to register allocation problem

Integer ProgrammingGraph Coloring

The formulations and usages of these solutions are shownFaster ORA reduces the number of register allocation IP decision variables compared to the basic IP formulations IP gives better results as compared to graph coloring

Page 39: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

39

ReferencesG. Chatin and M. Auslender, “Register allocation via coloring,” Computer Languages, 1981D. Goodwin and K. Wilken, “Optimal and near-optimal global register allocation using 0-1 integer programming,” Software Practice and Experience, 1996 C. Fu, K. Wilken and D. Goodwin, “A Faster Optimal Register Allocator,” Journal of Instruction-Level Parallelism 7, 2005

Page 40: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December 2005

40

Thank You

ANY QUESTIONS??