progressive register allocation for irregular architectures

Post on 12-Jan-2016

46 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Progressive Register Allocation for Irregular Architectures. David Koes dkoes@cs.cmu.edu Seth Copen Goldstein seth@cs.cmu.edu March 23, 2005. eax. ebx. ecx. edx. esi. edi. esp. ebp. Irregular Architectures. Few registers Register usage restrictions - PowerPoint PPT Presentation

TRANSCRIPT

2005 International Symposium on Code Generation and Optimization

Progressive Register Allocation for Irregular

Architectures

David Koesdkoes@cs.cmu.edu

Seth Copen Goldsteinseth@cs.cmu.edu

March 23, 2005

2005 International Symposium on Code Generation and Optimization2

Irregular Architectures

• Few registers

• Register usage restrictions– address registers, hardwired registers...

• Memory operands

• Examples:– x86, 68k, ColdFire,

ARM Thumb, MIPS16, V800, various DSPs...

eaxebxecxedxesiedi

ebpesp

2005 International Symposium on Code Generation and Optimization3

Fewer Registers More Spills

• Used gcc to compile >10,000 functions from Mediabench, Spec95, Spec2000, and micro-benchmarks

• Recorded which functions spilled

Percent of functions that spill

05

101520253035404550

PPC (32) 68k (16) x86 (8)

Percent

2005 International Symposium on Code Generation and Optimization4

Register Usage Restrictions

• Instructions may prefer or require a specific subset of registers– x86 multiply instruction

imul %edx,%eax // 2 byte instruction

imul %edx,%ecx // 3 byte instruction– x86 divide instruction

idivl %ecx // eax = edx:eax/ecx

2005 International Symposium on Code Generation and Optimization5

Memory Operands

• Load/store not always needed to access variables allocated to memory– depends upon instruction– still less efficient than register access

addl 8(%ebp), %eax vs

movl 8(%ebp), %edxaddl %edx, %eax

2005 International Symposium on Code Generation and Optimization6

Register Allocation Challenges

• Optimize spill code– with few registers, spilling unavoidable

• Model register usage restrictions

• Exploit memory operands– affects spilling decisions

2005 International Symposium on Code Generation and Optimization7

Previous Work

Method Models Irregular Features

Fast Optimal

Graph Coloring

Integer Programming[Goodwin and Wilken 96]

[Kong and Wilken 98]

[Fu and Wilken 2002]

Separated IP[Appel and George 01]

PBQP[Scholz and Eckstein 02] / /

2005 International Symposium on Code Generation and Optimization8

Our Goals

• Expressive– Explicitly represent architectural irregularities

and costs

• Proper model– An optimum solution results in optimal

register allocation

• Progressive solution algorithm– more computation better solution– decent feasible solution obtained quickly– competitive with current allocators

2005 International Symposium on Code Generation and Optimization9

Multicommodity Network Flow (MCNF)

a b

a b

2

22 4

444

instruction

crossbar

source

sink

2005 International Symposium on Code Generation and Optimization10

Modeling Usage Constraints

int foo(int a, int b, int c){ a = a*b; return a/c;}

a

a

b

imuleax edx ecx mem

b

1-1

idiveax edx ecx mem

c

c

1

not quite right…

2005 International Symposium on Code Generation and Optimization11

Modeling Spills and Moves

int foo(int a, int b, int c){ a = a*b; return a/c;}

a

imuleax edx ecx mem

b

1-1

eax edx ecx mem

eax edx ecx mem

c

b

3 3 3

a

idiveax edx ecx mem

c

1

eax edx ecx mem

eax edx ecx mem

2005 International Symposium on Code Generation and Optimization12

Modeling Stores

• Simple approach flawed– doesn’t model memory

persistency

• Solution: antivariables– flow only through memory– eviction cost = store cost– evict only once

2005 International Symposium on Code Generation and Optimization13

Register Allocation as MCNF

• Variables Commodities

• Variable Usage Network Design

• Nodes Allocation Classes (Reg/Mem)

• Registers Limits Node Capacities

• Spill Costs Edge Costs

• Variable Definition Source

• Variable Last Use Sink

2005 International Symposium on Code Generation and Optimization14

Solving an MCNF

• Integer solution NP-complete

• Use standard IP solvers– commercial solvers (CPLEX) are impressive

• Exploit structure of problem– variety of MCNF specific solvers

• empirically faster than IP solvers

• Lagrangian Relaxation technique

2005 International Symposium on Code Generation and Optimization15

Lagrangian Relaxation: Intuition

• Relaxes the hard constraints – only have to solve single commodity flow

• Combines easy subproblems using a Lagrangian multiplier– an additional price on each edge

a b

a b

01

Example:edges have unit capacity

a b

a b

0+11with price, solution to single commodity flow can be solution to multicommodity flow

2005 International Symposium on Code Generation and Optimization16

Solution Procedure

• Compute prices using iterative subgradient optimization– converge to optimal prices

• At each iteration, greedily construct a feasible solution using current prices– allocate most expensive vars first– can always find an allocation

2005 International Symposium on Code Generation and Optimization17

Solution Procedure

• Advantages+ have feasible solution at each step+ iterative nature progressive+ Lagrangian relaxation theory provides

means for computing a lower bound+ Can compute optimality bound

• Disadvantages– No guarantee of optimality of solution

2005 International Symposium on Code Generation and Optimization18

Evaluation

• Replace gcc’s local allocator

• Optimize for code size– easy to statically evaluate

• Evaluate on MediaBench, MiBench, SpecInt95, SpecInt2000– consider only blocks where local allocation is

interesting (enough variables to spill)

2005 International Symposium on Code Generation and Optimization19

Behavior of Solver

2005 International Symposium on Code Generation and Optimization20

Proven Optimality

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 Iter10 Iters

100 Iters1000 Iters

1 Iter10 Iters

100 Iters1000 Iters

1 Iter10 Iters

100 Iters1000 Iters

1 Iter10 Iters

100 Iters1000 Iters

5-10conflicts

(355 blocks)

10-15conflicts

(23 blocks)

15-20conflicts

(7 blocks)

>= 20conflicts

(5 blocks)

>25%

Within 20%

Within 15%

Within 10%

Within 5%

Optimal

2005 International Symposium on Code Generation and Optimization21

Comprehensive Results

-15.00%

-10.00%

-5.00%

0.00%

5.00%

10.00%

15.00%

20.00%

1 Iter10 Iters

100 Iters1000 Iters

1 Iter10 Iters

100 Iters1000 Iters

1 Iter10 Iters

100 Iters1000 Iters

1 Iter10 Iters

100 Iters1000 Iters

5-10 conflicts(355 blocks)

10-15 conflicts(23 blocks)

15-20 conflicts(7 blocks)

>= 20 conflicts(5 blocks)

Improvement over gcc

artifact of interaction with gcc

2005 International Symposium on Code Generation and Optimization22

Progressive Nature

:-(

2005 International Symposium on Code Generation and Optimization23

Contributions

• New MCNF model for register allocation+ expressive, can model irregular architectures+ can be solved using conventional ILP solvers

• Progressive solution procedure+ decent initial solution+ maintains feasible solution+ improves solution over time– no optimality guarantees

Progressive

top related