learning for optimizing compilers
DESCRIPTION
John Cavazos Architecture and Language Implementation Lab Thesis Seminar University of Massachusetts, Amherst. Learning for Optimizing Compilers. Compiler writers have a difficult task optimizations are NP-hard computer architectures are complex computer architects need rapid evaluation - PowerPoint PPT PresentationTRANSCRIPT
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
John CavazosArchitecture and Language Implementation Lab
Thesis Seminar
University of Massachusetts, Amherst
Learning for Optimizing Compilers
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 2
Motivation
Compiler writers have a difficult task optimizations are NP-hard computer architectures are complex computer architects need rapid
evaluation Generating heuristics manually is
slow, complicated, and ad hoc.
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 3
Propose Supervised Learning
Induces heuristics automatically Training examples
a,b,c,…,z label a,b,c,…z : properties of problem label : proper decision to make
Two objectives: Minimize error Prefer less complicated function
LOCO (Learning for Optimizing COmpilers)
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 4
Benefits of Supervised Learning
Heuristic construction sped up Determines relative importance of
features Effective heuristics
Comparable to hand-tuned heuristics Theoretically sound
Traditional approach ad hoc
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 5
Taxonomy of Compiler Heuristics
1. What Order to Apply Optimizations Phase-ordering heuristics
2. When to Optimize Filters
3. Which Optimization Algorithm to Apply
Hybrid Optimizations
4. How to Optimize Priority Functions
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 6
The LOCO Methodology
Determine class of heuristic Generate raw data
Instrument compiler Process raw data
Thresholds Generates training data
Induce heuristic Integrate into compiler
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 7
The LOCO Methodology
SupervisedLearning
InstrumentedCompiler
TrainingSet
ProductionCompiler
Generate raw learning data
Process raw data(Thresholding)
Rule induction
Induces heuristic
LOCO
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 8
Experimental Setup
Java JIT compiler Jikes RVM 2.0.2
PowerPC 533 MHz G4, model 7410 Case Study 1: SPEC JVM benchmarks Case Study 2: Scientific benchmarks
Scheduling improves by 4% or more
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 9
HybridRegister Allocation
Case Study 1
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 10
Motivation
Register Allocation: important Effective use of registers
Different Algorithms to choose from Graph coloring: possibly expensive Linear scan: not always effective
Which algorithm to apply?
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 11
Solution
Features predict which algorithm to use
Heuristic function controls allocator Reduces cost significantly Retains most benefit
Successful with simple features Applicable to other optimizations
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 12
Hybrid Register Allocation
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 13
Features of Methods
Features MeaningOut, In, and
Exception Out Edges
Out, in, and exception out edges in CFG (total, avg)
Live on Entry
Live on Exit
Number of edges live on entry and exit (total, min, max)
Insts and Blocks Number of instructions and blocks in method (total)
Block size Size of blocks (max, min, avg)
Intervals Number of live intervals (max, total, avg)
Symbolics Number of symbolics (total, avg)
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 14
Hybrid Register Allocation
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 15
Inducing Heuristic Controller
For each block generate raw training data Features of method Additional spills incurred Cost of allocation algorithms
Process raw data to generate training set
Leave-one-out cross-validation Output of LOCO = heuristic
controller
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 16
Labeling Training Instances
Two factors: Cost of register allocation Spill benefit of different allocators
Prefer graph coloring If benefit above threshold
Prefer linear scan If graph coloring cost above
threshold No spill benefit
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 17
Motivation for Threshold Technique
Noise reduction technique Simplifies learning
Removes cases of fine distinction
Separation by a threshold gap For example:
T=10% model estimates improvement by 10%
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 18
Thresholding
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.E-01 1.E+01 1.E+03 1.E+05 1.E+07 1.E+09
LS Spills - GC Spills
1 -
(L
S C
os
t /
GC
Co
st)
Cost Threshold (0.5)Spill Threshold(8192)
Graph ColoringLinear Scan
No Instance
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 19
Labeling Training Instances
If (LS_Spill – GC_Spill > Spill_Threshold)
Print “GC”;Else If (LS_Cost/GC_Cost >
Cost_Threshold) Print “LS”;
Else if (LS_Spill – GC_Spill <= 0) Print “LS”;Else { // No Label }
High Spill Benefit
High Cost
No Spill BenefitSkip Training
Instance
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 20
Threshold Example
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 21
Spill Loads(Opt Level 3, 8 Regs)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
GC B0C0 B8kC0 B64kC0 B0C50 B8kC50 B64kC50 LS
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 22
Benchmark Running Times(Opt Level 3, 8 Regs)
0
0.2
0.4
0.6
0.8
1
1.2
GC B0C0 B8kC0 B64kC0 B0C50 B8kC50 B64kC50 LS
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 23
Register Allocation Stats(Opt Level 3, 8 Regs)
REG ALG
Run Time
Allocation Cost
GC 91.9%
100%
B0C0 93.4% 83.0%
B8kC0 93.1% 71.2%
B64kC0 93.7% 66.7%
B0C50 93.3% 82.4%
B8kC50 94.0% 40.9%
B64kC50 96.6% 27.9%
LS 100% 13.0%
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 24
Register Allocation Cost(Opt Level 3, 8 Regs)
0
0.2
0.4
0.6
0.8
1
1.2
GC B0C0 B8KC0 B64KC0 B0C50 B8kC50 B64C50 LS
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 25
Hybrid Register Allocation is Successful
Significantly reduce register allocation time Reduced allocation time by 60%
Preserve benefit of graph coloring Achieved 93% of graph coloring
benefit LOCO effective for this heuristic
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 26
Instruction SchedulingFilters
Case Study 2:
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 27
Motivation
Instruction scheduling: important Improvements over 15%
But: Expensive Frequently not beneficial
Problem: Can we predict which blocks benefit from scheduling?
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 28
Solution
Features of block predict when to schedule
Heuristic controls scheduling Reduces cost of scheduling Retains benefit of scheduling
Successful with simple features Filter for applying scheduler
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 29
An Optimization Filter
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 30
Features of Block
Features Kind Meaning
BBLen Block size Number of Instructions
Load, Store, Branch, Call Return
Operation Fraction of that type of instruction
Integer, Float, System Functional unit Fraction of instruction that executes on that FU
PEI, GC, Yield, Thread Hazard Fraction of that type of hazard instruction
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 31
Inducing a Filter
Construct cheap-to-compute features of a block
Obtain training instances that include: Features of the block Labels (Scheduling benefit to block)
Induce a filter using LOCO We used rule induction
Use the filter to control when compiler schedules
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 32
Block Timing Estimator
Estimate of cycles to execute block Simple model of real machine
Determines cost of block in isolation
Relative cycle differences important Not absolute cycle counts
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 33
Labeling using Thresholds
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 34
80
85
90
95
100
105
aes bh linpack power voronoi scimark geomean
Rat
io t
o N
ot
Sch
edu
lin
g
0% 5% 10% 15% 20% 25% LS
Running Time with Filtering
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 35
80
85
90
95
100
105
aes bh linpack power voronoi scimark geomean
%P
ct
of
No
t S
ch
ed
uli
ng
0% 5% 10% 15% 20% 25% LS
Running Time with Filtering
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 36
80
85
90
95
100
105
aes bh linpack power voronoi scimark geomean
%P
ct o
f N
ot
Sch
edu
lin
g
0% 5% 10% 15% 20% 25% LS
Running Time with Filtering
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 37
0
10
20
30
40
50
60
70
80
90
100
aes bh linpack power voronoi scimark geomean
Fra
ctio
n o
f L
S t
ime
0% 5% 10% 15% 20% 25%
Scheduling Time with Filtering
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 38
0
10
20
30
40
50
60
70
80
90
100
aes bh linpack power voronoi scimark geomean
Fra
ctio
n o
f L
S t
ime
0% 5% 10% 15% 20% 25%
Scheduling Time with Filtering
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 39
Filtering Statistics
0%
5%
10%
15%
20%
25%
30%
35%
40%
Sched Blocks Sched Insts Filter/Sched Sched/Comp
0% 5% 10% 15% 20% 25%
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 40
Filters are Successful
Significantly reduce scheduling time Reduced scheduling time by 75%
Preserve benefit of scheduling Achieved 93% of scheduling benefit
LOCO effective for this heuristic
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 41
Related Work
Supervised learning Loop-unrolling and tiling
Genetic algorithms Hyperblocks, reg allocation, prefetching (MIT) Application-specific compilation strategy (Rice)
Reinforcement learning Used to induce heuristic for scheduling
(UMass) We argue LOCO is better
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 42
Future Work
More work on filters Inlining and SSA-based opts
More work on hybrid optimizations Garbage collection
More work on priority functions Register allocation spill heuristic
Use LOCO anywhere a heuristic is used
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 43
Conclusion
LOCO effective at constructing heuristics Faster than most alternatives
LOCO can lead to insights More readable than other alternatives
LOCO heuristics competitive Comparable to hand-tuned heuristics
LOCO easier to use
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 44
Spill Loads(Opt Level 1, 8 Regs)
0
0.5
1
1.5
2
2.5
GC B0C0 B8kC0 B64kC0 B0C50 B8kC50 B64kC50 LS
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 45
Register Allocation Cost(Opt Level 1, 8 Regs)
0
0.2
0.4
0.6
0.8
1
1.2
GC B0C0 B8KC0 B64KC0 B0C50 B8KC50 B64KC50 LS
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 46
Benchmark Running Times (Opt Level 1, 8 Regs)
0
0.2
0.4
0.6
0.8
1
1.2
GC B0C0 B8kC0 B64kC0 B0C50 B8kC50 B64kC50 LS