intraprocedural optimizations

36
Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab

Upload: ronli

Post on 11-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Intraprocedural Optimizations. Jonathan Bachrach MIT AI Lab. Outline. Goal: eliminate abstraction overhead using static analysis and program transformation Topics: Intraprocedural type inference Static method selection Specialization and Inlining Static class prediction Splitting - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Intraprocedural Optimizations

Intraprocedural Optimizations

Jonathan BachrachMIT AI Lab

Page 2: Intraprocedural Optimizations

Outline• Goal: eliminate abstraction overhead using static analysis and

program transformation• Topics:

– Intraprocedural type inference– Static method selection– Specialization and Inlining– Static class prediction– Splitting– Box/unboxing– Common Subexpression Elimination– Overflow and range checks– Partial evaluation revisited

• Partially based on: Chambers’ “Efficient Implementation of Object-oriented Programming Languages” OOPSLA Tutorial

Page 3: Intraprocedural Optimizations

Running Example(dg + ((x <num>) (y <num>) => <num>))

(dm + ((x <int>) (y <int>) => <int>) (%ib (%i+ (%iu x) (%iu y)))

(dm + ((x <flo>) (y <flo>) => <flo>) (%fb (%f+ (%fu x) (%fu y)))

(dm x2 ((x <num>) => <num>) (+ x x))(dm x2 ((x <int>) => <int>) (+ x x))

• Anatomy of Pure Proto Arithmetic– Dispatch– Boxing– Overflow checks– Actual instruction

• C Arithmetic– Actual instruction

Page 4: Intraprocedural Optimizations

Biggest Inefficiencies

• Method dispatch• Method calls• Boxing• Type checks • Overflow and range checks• Slot access• Object creation

Page 5: Intraprocedural Optimizations

Intraprocedural Type Inference

• Goal: determine concrete class(es) of each variable and expression

• Standard data flow analysis through control graph– Propagate bindings b -> { class … } – Sources are literals, isa expressions, results of some

primitives, and type declarations– Form unions of bindings at merge points– Narrow sets after typecases– Assumes closed world (or at least final classes)

Page 6: Intraprocedural Optimizations

Type Inference Example(set x (isa <tab> …)) ;; x in { <tab> }(set y (table-growth-factor x)) ;; y in { <int> <flo> }(set z (if t x y)) ;; z in { <tab> <int>

<flo> }

Page 7: Intraprocedural Optimizations

Narrowing Type Precision(if (isa? x <int>) (+ x 1) (+ x 37.0))

(if (isa? x <int>) (let (([x <int>] x)) (+ x 1)) (let (([x !<int>] x)) (+ x 37.0)))

Page 8: Intraprocedural Optimizations

Static Method Selection(set x (isa <tab> …)) ;; x in { <tab> }(set y (table-growth-factor x)) ;; y in { <int> <flo> }(print out y)

• If only one class is statically possible then can perform dispatch statically:(set y (<tab>:table-growth-factor x))

• If a couple classes are statically possible then can insert typecase:(sel (class-of y) ((<int>) (<int>:print y)) ((<flo>) (<flo>:print y)))

Page 9: Intraprocedural Optimizations

Type Check Removal

• Type inference can clearly be used to remove type checks and casts

(set x (isa <tab> …)) ;; x in { <tab> }(if (isa? x <tab>) (go) (stop))==>(set x (isa <tab> …)) ;; x in { <tab> }(go)

Page 10: Intraprocedural Optimizations

Intraprocedural Type Inference Critique

• Pros: – Simple– Fast – Fewer dependents

• Cons: – Limited type precision

• No result types• Incoming arg types• No slot types• Etc.

Page 11: Intraprocedural Optimizations

Specialization

• Q: How can we improve intraprocedural type inference precision?

• A: Specialization which is the cloning of methods with narrowed argument types

• Improves type precision of callee by contextualizing body:(dm sqr ((x <num>) (y <num>)) (* x y))==>(dm sqr ((x <int>) (y <int>)) (* x y))(dm sqr ((x <flo>) (y <flo>)) (* x y))

• Must make sure super calls still mean same thing

Page 12: Intraprocedural Optimizations

Specialization of Constructors

• Crucial to get object creation to be fast• Specialization can be used to build custom

constructors(def <thingy> (isa <any>)) (slot <thingy> thingy-x 0) (slot (t <thingy>) thingy-tracker (+ (thingy-x t) 1)) (slot <thingy> thingy-cache (fab <tab>))

(df thingy-isa (x tracker cache) (let ((thingy (clone <thingy>))) (unless (== x nul) (set (%slot-value thingy thingy-x) x)) (set (%slot-value thingy thingy-tracker) (if (== tracker nul) (+ (thingy-x p) 1) tracker)))) (set (%slot-value thingy thingy-cache) (if (== cache nul) (fab <tab>) cache))))

Page 13: Intraprocedural Optimizations

Inlining

• Q: Can we do better?• A: Inlining can improve specialization by

inserting specialized body• Improves type precision at call-site by

contextualizing body (includes result types):(dm f ((x <int>) (y <int>)) (+ (g x y) 1))(dm g (x y) (+ x y))==>(dm f ((x <int>) (y <int>)) (+ (+ x y) 1))

Page 14: Intraprocedural Optimizations

Synergy: Method Selection + Inlining

(df f ((x <int>) (y <int>)) (+ x y))

;; method selection(df f ((x <int>) (y <int>)) (<int>:+ x y))

;; inlining(df f ((x <int>) (y <int>)) (%ib (%i+ (%iu x) (%iu y))))

Page 15: Intraprocedural Optimizations

Pitfalls of Inlining and Specialization

• Must control inlining and specialization carefully to avoid code bloat

• Inlining can work merely using syntactic size trying never to increase size over original call

• Class-centric specialization usually works by copying down inherited methods tightening up self references (harder for multimethods)

• Can run inlining/specialization trials based on– Final static size– Performance feedback

Page 16: Intraprocedural Optimizations

Class Centric Specialization(def <point> (isa <any>)) (slot <point> (point-x <int>) 0)(dm point-move ((p <point>) (offset <num>)) (set (point-x p) (+ (point-x p) offset)))(def <color-point> (isa <point>))

==>

(dm point-move ((p <color-point>) (offset <num>)) (set (point-x p) (+ (point-x p) offset)))

Page 17: Intraprocedural Optimizations

Static Class Prediction

• Can improve type precision in cases where for a given generic a particular method is much more frequent

• Insert type check testing prediction– Can narrow type precision along then and else

branches• Especially useful in combination with

inlining

Page 18: Intraprocedural Optimizations

Static Class Prediction Example(df f (x) (let ((y (+ x 1))) (+ y 2)))

(df f (x) (let ((y (if (isa? x <int>) (+ x 1) (+ x 1)))) (if (isa? y <int>) (+ y 2) (+ y 2)))))

(df f (x) (let ((y (if (isa? x <int>) (<int>:+ x 1) (+ x 1)))) (if (isa? y <int>) (<int>:+ y 2) (+ y 2)))))

Page 19: Intraprocedural Optimizations

Synergy: Class Prediction + Method Selection + Inlining

(df f (x) (let ((y (if (isa? x <int>) (+ x 1) (+ x 1)))) (if (isa? y <int>) (+ y 2) (+ y 2)))))

;; method selection(df f (x) (let ((y (if (isa? x <int>) (<int>:+ x 1) (+ x 1)))) (if (isa? y <int>) (<int>:+ y 2) (+ y 2)))))

;; inlining(df f (x) (let ((y (if (isa? x <int>) (%ib (%i+ (%iu x) %1)) (+ x 1)))) (if (isa? y <int>) (%ib (%i+ (%iu y) (%iu 2))) (+ y 2)))))

Page 20: Intraprocedural Optimizations

Splitting

• Problem: Class prediction often leads to a bunch of redundant type tests

• Solution: Split off whole sections of graph specialized to particular class on variable– Can split off entire loops– Can specialize on other dataflow information

Page 21: Intraprocedural Optimizations

Splitting Example(df f (x) (let ((y (+ x 1))) (+ y 2)))

(df f (x) (if (isa? x <int>) (let ((y (+ x 1))) (+ y 2)) (let ((y (+ x 1))) (+ y 2))))

(df f (x) (if (isa? x <int>) (let ((y (<int>:+ x 1))) (<int>:+ y 2)) (let ((y (+ x 1))) (+ y 2))))

Page 22: Intraprocedural Optimizations

Splitting Downside

• Splitting can also lead to code bloat• Must be intelligent about what to split

– A priori knowledge (e.g., integers most frequent)

– Actual performance

Page 23: Intraprocedural Optimizations

Box / Unboxing(df + ((x <int>) (y <int>) => <int>) (%ib (%i+ (%iu x) (%iu y))))

(df f ((a <int>) (b <int>) => <int>) (+ (+ a b) a))

;; inlining +

(df f ((a <int>) (b <int>) => <int>) (%ib (%i+ (%iu (%ib (%i+ (%iu a) (%iu b)))) (%iu a))))

;; remove box/unbox pair

(df f ((a <int>) (b <int>) => <int>) (%ib (%i+ (%i+ (%iu a) (%iu b)) (%iu a))))

Page 24: Intraprocedural Optimizations

Synergy: Splitting + Method Selection + Inlining + Box/Unboxing

(df f (x) (if (isa? x <int>) (let ((y (+ x 1))) (+ y 2)) (let ((y (+ x 1))) (+ y 2))))

;; method selection(df f (x) (if (isa? x <int>) (let ((y (<int>:+ x 1))) (<int>:+ y 2)) (let ((y (+ x 1))) (+ y 2))))

(df f (x) (if (isa? x <int>) (<int>:+ (<int>:+ x 1) 2) (let ((y (+ x 1))) (+ y 2))));; inlining(df f (x) (if (isa? x <int>) (%ib (i+ (%iu (%ib (%i+ (%iu x) %1)))) %2)) (let ((y (+ x 1))) (+ y 2))));; box/unbox(df f (x) (if (isa? x <int>) (%ib (%i+ (%i+ (%iu x) %1)) %2)) (let ((y (+ x 1))) (+ y 2))))

Page 25: Intraprocedural Optimizations

Common Subexpression Elimination (CSE)

• Removes redundant computations– Constant slot or binding access– Stateless/side-effect-free function calls

• Examples(or (elt (cache x) ‘a) (elt (cache x) ‘b)) ==> (let ((t (cache x))) (or (elt t ‘a) (elt t ‘b))

(if (< i 0) (if (< i 0) (go) (putz)) (dance)) ==> (if (< i 0) (go) (dance))

Page 26: Intraprocedural Optimizations

Overflow and Bounds Checksaka “Moon Challenge”

• Goal: – Support mathematical integers and bounds checked collection

access– Eliminate bounds and overflow checks

• Strategy:– Assume most integer arithmetic and collection accesses occur

in restricted loop context where range can be readily inferred– Perform range analysis to remove checks

• Bound from above variables by size of collection• Bound from below variables by zero• Induction step is 1+

Page 27: Intraprocedural Optimizations

Range Check Example(rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (elt v i))) (rep (+ sum e) (+ i 1))) sum))

;; inlining bounds checks(rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (if (or (< i 0) (>= i (len v))) (sig ...) (vref v i)))) (rep (+ sum e) (+ i 1))) sum))

;; CSE(rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (if (< i 0) (sig ...) (vref v i)))) (rep (+ sum e) (+ i 1))) sum))

;; range analysis(rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (vref v i))) (rep (+ sum e) (+ i 1))) sum))

Page 28: Intraprocedural Optimizations

Overflow Check Removal aka “Moon Challenge” Critique

• Pros: – simple analysis

• Cons: – could miss a number of cases

• but then previous approaches (e.g., box/unbox) could be applied

Page 29: Intraprocedural Optimizations

Advanced topic:Representation Selection

• Embed objects in others to remove indirections

• Change object representation over time• Use minimum number of bits to represent

enums• Pack fields in objects

Page 30: Intraprocedural Optimizations

Advanced Topic:Algorithm Selection

• Goal: compiler determines that one algorithm is more appropriate for given data– Sorted data– Biased data

• Solution: – Embed statistics gathering in runtime– Add guards to code and split

Page 31: Intraprocedural Optimizations

Rule-based Compilation• First millennium compilers were based on special rules

for– Method selection– Pattern matching– Oft-used system functions like format

• Problems– Error prone– Don’t generalize to user code

• Challenge– Minimize number of rules– Competitive compiler speed– Produce competitive code

Page 32: Intraprocedural Optimizations

Partial Evaluation to the Rescue

• Holy grail idea:– Optimizations are manifest in code– Do previous optimizations with only p.e.

• Simplify compiler based on limited moves– Static eval and folding– Inlining

• Eliminate– Custom method selection– Custom constructor optimization– Etc.

Page 33: Intraprocedural Optimizations

Partial Eval Example(dm format (port msg (args …)) (rep nxt ((I 0) (ai 0)) (when (< I (len msg))) (let ((c (elt msg I))) (if (= c #\%) (seq (print port (elt args ai)) (nxt (+ I 1) (+ ai 1)))) (seq (write port c) (nxt (+ I 1) ai)))))))

(format out “%>? ” n)

• First millennium solution is to have a custom optimizer for format

(seq (print port n) (write port “> “))

• Second millennium solution with partial evaluation

(nxt 0 0)

(seq (print port n) (nxt 1 1))

(seq (print port n) (seq (write port #\>) (nxt 2 1)))

(seq (print port n) (seq (write port #\>) (seq (write port #\space))))

Page 34: Intraprocedural Optimizations

Partial Eval Challenge

• Inlining and static eval are slow– “Running” code through inlining– Need to compile oft-used optimizations

• Residual code is not necessarily efficient– Sometimes algorithmic change is necessary for optimal

efficiency• Example: method selection uses class numbering and decision

tree whereas straightforward code does naïve method sorting • Perhaps there is a middle ground

Page 35: Intraprocedural Optimizations

Open Problems

• Automatic inlining, splitting, and specialization• Efficient mathematical integers• Constant determination• Representation selection• Algorithmic selection• Efficient partial evaluation• Super compiler that runs for days

Page 36: Intraprocedural Optimizations

Reading List

• Chambers: “Efficient Implementation of Object-oriented Programming Languages” OOPSLA Tutorial

• Chambers and Ungar: SELF papers• Chambers et al.: Vortex papers