Download - How to select superinstructions for Ruby
![Page 1: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/1.jpg)
How to select superinstructions for Ruby
ZAKIROV Salikh*, CHIBA Shigeru*, and SHIBAYAMA Etsuya**
* Tokyo Institute of Technology,dept. of Mathematical and Computing Sciences
** Tokyo University, Information Technology Center
![Page 2: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/2.jpg)
Ruby
• Dynamic language• Becoming popular
recently• Numeric benchmarks
100—1000 times slower than equivalent program in C
Numeric benchmarks marked in red
* http://shootout.alioth.debian.org/2
![Page 3: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/3.jpg)
Interpreter optimization efforts
• Many techniques to optimize interpreter were proposed– Threaded interpretation– Stack top caching– Pipelining– Superinstructions
• Superinstructions– Merge code of operations executed in sequence
3
Focus of this presentation
![Page 4: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/4.jpg)
Superinstructions (contrived example)
PUSH: // put <imm> argument on stack stack[sp++] = *pc++; goto **pc++;
ADD: // add two topmost values on stack sp--; stack[sp-1] += stack[sp]; goto **pc++;
PUSH_ADD: // add <imm> to stack top stack[sp++] = *pc++; //goto **pc++; sp--; stack[sp-1] += stack[sp]; goto **pc++;
PUSH_ADD: // add <imm> to stack top stack[sp-1] += *pc++; goto **pc++;
Dispatch eliminated
Optimizations applied
4
![Page 5: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/5.jpg)
Superinstructions (effects)
• Effects1. Reduce dispatch overhead
a. Eliminate some jumpsb. Provide more context for indirect branch predictorby
replicating indirect jump instructions
2. Allow more optimizations within VM op
5
![Page 6: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/6.jpg)
Good for reducing dispatch overhead
Superinstructions help when:• VM operations are small (~10 hwop/vmop)• Dispatch overhead is high (~50%)
Examples of successful use in prior research• ANSI C interpreter: 2-3 times improvement
(Proebsting 1995)• Ocaml: more than 50% improvement (Piumarta 1998)• Forth: 20-80% improvement (Ertl 2003)
6
![Page 7: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/7.jpg)
Superinstructions help when:• VM operations are small (~10 hwop/vmop)• Dispatch overhead is high (~50%)
Ruby does not fit well
Hardware profiling data on Intel Core 2 Duo
60-140 hardware ops per VM op
Only 1-3% misprediction overhead on interpreter dispatch
7
BUT
![Page 8: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/8.jpg)
Superinstructions for Ruby
• We experimentally evaluated effect of “naive” superinstructions on Ruby– Superinstructions are selected statically– Frequently occurring in training run combinations
of length 2 selected as superinstructions– Training run uses the same benchmark– Superinstructions constructed by concatenating C
source code, C compiler optimizations applied
8
![Page 9: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/9.jpg)
Naive superinstructions effect on Ruby
9
Number of superinstructions used
Norm
alized execution time
Limited benefit
Unpredictableeffects
4 benchmarks
![Page 10: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/10.jpg)
Branch mispredictions
10
Number of superinstructions used
Norm
alized execution time
2 benchmarks: mandelbrot and spectral_norm
![Page 11: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/11.jpg)
Branch mispredictions, reordered
11
Number of superinstructions used, reordered by execution time
Norm
alized execution time
2 benchmarks: mandelbrot and spectral_norm
![Page 12: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/12.jpg)
So why Ruby is slow?
• Profile of numeric benchmarks
12
Garbage collection takes significant time
Boxed floating point values dominate
allocation
![Page 13: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/13.jpg)
Floating point value boxing
13
OPT_PLUS: VALUE a = *(sp-2); VALUE b = *(sp-1); /* ... */ if (CLASS_OF(a) == Float && CLASS_OF(b) == Float) { sp--; *(sp-1) = NEW_FLOAT(DOUBLE_VALUE(a) + DOUBLE_VALUE(b)); } else { CALL(1/*argnum*/, PLUS, a); } goto **pc++;
New “box” object is allocated on each operation
Typical Ruby 1.9 VM operation
![Page 14: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/14.jpg)
Proposal: use superinstructions for boxing optimization
• 2 operation per allocation instead of 1
14
OPT_MULT_OPT_PLUS: VALUE a = *(sp-3); VALUE b = *(sp-2); VALUE c = *(sp-1); /* ... */ if (CLASS_OF(a) == Float && CLASS_OF(b) == Float && CLASS_OF(c) == Float) { sp-=2; *(sp-1) = NEW_FLOAT(DOUBLE_VALUE(a) + DOUBLE_VALUE(b)*DOUBLE_VALUE(c)); } else { CALL(1/*argnum*/, MULT/*method*/, b/*receiver*/); CALL(1/*argnum*/, PLUS/*method*/, a/*receiver*/); } goto **pc++;
Boxing of intermediate result eliminated
![Page 15: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/15.jpg)
Implementation
15
• VM operations that handle floating point values directly:– opt_plus– opt_minus– opt_mult– opt_div– opt_mod
• We implemented all 25 combinations of length 2– Based on Ruby 1.9.1– Using existing Ruby infrastructure for superinstructions with
some modifications
![Page 16: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/16.jpg)
Limitations
• Coding style-sensitive• Not applicable to other types (e.g. Fixnum,
Bignum, String)– Fixnum is already unboxed– Bignum and String cannot be unboxed
• Sequences of 3 arithmetic instructions or longer virtually non-existent– No occurrences in the benchmarks
16
![Page 17: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/17.jpg)
Evaluation
• Methodology– median time of 30 runs
• Reduction in allocation
17
![Page 18: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/18.jpg)
Results
• Up to 22% benefit on numeric benchmarks• No slowdown on other benchmarks
18
![Page 19: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/19.jpg)
Example: mandelbrot tweak
19
ITER.times do- tr = zrzr - zizi + cr+ tr = cr + (zrzr - zizi)- ti = 2.0*zr*zi + ci + ti = ci + 2.0*zr*zi
• Slight modification produces 20% difference in performance– 4 of 9 arithmetic instructions get
merged into 2 superinstructions– 24% reduction in float allocation
Norm
alized execution time
![Page 20: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/20.jpg)
Discussion of alternative approaches
• Faster GC would improve performance as well– Superinstructions still apply, but with reduced
benefit• Type inference– Would allow to specialize expressions and
eliminate boxing– Interoperability with dynamic code is an issue
• Dynamic specialization– Topic for further research
20
![Page 21: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/21.jpg)
Related work: Tagged values
• Use lower bits of pointers to trigger alternative handling
• Embed floating point value into higher bits• Limited to 64-bit platforms, as Ruby uses double
precision 64 bit floating point arithmetic– Our approach has same effect on 32 and 64 bit
platforms• Allows to eliminate majority of boxed floats• Provides 28-35% benefit (on the same benchmarks)
21
* Sasada 2008
![Page 22: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/22.jpg)
Related work: Lazy boxing
• Java-like language with generics over value-types• Boxing needed to avoid duplication of template
instantiation code for primitive types• Lazy optimization works by allocating boxed
objects in the stack frame, and moving to heap as needed
• Relies on static compiler analysis for escape path detection, and runtime checks
22
* Owen 2004
![Page 23: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/23.jpg)
Related work:Superinstructions
Superinstructions used for code compression– ANSI C hybrid compiler-interpreter – Trimedia code compression system
• Superinstructions chosen statically to minimize code size
Superinstructions used to reduce dispatch overhead– Forth, Ocaml
• Superinstructions chosen dynamically
23
* Piumarta 1998
* Proebsting 1995* Hoogerbrugge 1999
* Ertl 2003
![Page 24: How to select superinstructions for Ruby](https://reader034.vdocuments.site/reader034/viewer/2022050821/568164aa550346895dd6a803/html5/thumbnails/24.jpg)
Conclusion
• Naive approach to superinstructions does not produce substantial benefit for Ruby
• Floating point values boxing overhead is a problem of Ruby
• Superinstructions provide some help (up to 22%)
Future work• Eliminate float boxing further– Specializing computation loop
24