building high-performance language implementations with low effort
TRANSCRIPT
![Page 1: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/1.jpg)
Building High-Performance
Language Implementations
With Low Effort
Stefan MarrFOSDEM 2015, Brussels, Belgium
January 31st, 2015
@smarrhttp://stefan-marr.de
![Page 2: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/2.jpg)
Why should you care about how Programming Languages work?
2SMBC: http://www.smbc-comics.com/?id=2088
![Page 3: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/3.jpg)
3SMBC: http://www.smbc-comics.com/?id=2088
Why should you care about how Programming Languages work?
• Performance isn’t magic
• Domain-specific languages• More concise• More productive
• It’s easier than it looks• Often open source• Contributions welcome
![Page 4: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/4.jpg)
What’s “High-Performance”?
4
Based on latest data from http://benchmarksgame.alioth.debian.org/Geometric mean over available benchmarks.
Disclaimer: Not indicate for application performance!
Competitively Fast!
0
3
5
8
10
13
15
18
Java V8 C# Dart Python Lua PHP Ruby
![Page 5: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/5.jpg)
Small and Manageable
16
260
525
562
1 10 100 1000
What’s “Low Effort”?
5
KLOC: 1000 Lines of Code, without blank lines and comments
V8 JavaScript
HotSpotJava Virtual Machine
Dart VM
Lua 5.3 interp.
![Page 6: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/6.jpg)
Language Implementation Approaches
6
Source Program
Interpreter
Run TimeDevelopmentTime
Input
Output
Source Program
Compiler Binary
Input
Output
Run TimeDevelopmentTime
Simple, but often slow More complex, but often fasterNot ideal for all languages.
![Page 7: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/7.jpg)
Modern Virtual Machines
7
Source Program
Interpreter
Run TimeDevelopment Time
Input
Output
Binary
Runtime Info
CompilerVirtual Machine
with
Just-In-TimeCompilation
![Page 8: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/8.jpg)
VMs are Highly Complex
8
Interpreter
Input
Output
Compiler Optimizer
Garbage Collector
CodeGen
Foreign Function Interface
Threadsand
MemoryModel
How to reuse most partsfor a new language?
Debugging Profiling
…
Easily500 KLOC
![Page 9: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/9.jpg)
How to reuse most partsfor a new language?
9
Input
Output
Make Interpreters Replaceable Components!
Interpreter
Compiler Optimizer
Garbage Collector
CodeGen
Foreign Function Interface
Threadsand
MemoryModel
Garbage Collector
…
Interpreter
Interpreter
…
![Page 10: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/10.jpg)
Interpreter-based Approaches
Truffle + Graal
with Partial Evaluation
Oracle Labs
RPythonwith Meta-Tracing
[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
![Page 11: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/11.jpg)
SELF-OPTIMIZING TREES
A Simple Technique for Language Implementation and Optimization
[1] Würthinger, T.; Wöß, A.; Stadler, L.; Duboscq, G.; Simon, D. & Wimmer, C. (2012), Self-Optimizing AST Interpreters, in 'Proc. of the 8th Dynamic Languages Symposium' , pp. 73-82.
![Page 12: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/12.jpg)
Code Convention
12
Python-ish
Interpreter Code
Java-ish
Application Code
![Page 13: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/13.jpg)
A Simple Abstract Syntax Tree Interpreter
13
root_node = parse(file)
root_node.execute(Frame())
if (condition) {cnt := cnt + 1;
} else {cnt := 0;
}
cnt
1
+cnt:=
ifcnt:=
0
cond
root_node
![Page 14: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/14.jpg)
Implementing AST Nodes
14
if (condition) {cnt := cnt + 1;
} else {cnt := 0;
}
class Literal(ASTNode):final valuedef execute(frame):return value
class VarWrite(ASTNode):child sub_exprfinal idxdef execute(frame):val := sub_expr.execute(frame)frame.local_obj[idx]:= valreturn val
class VarRead(ASTNode):final idxdef execute(frame):return frame.local_obj[idx]
cnt
1
+cnt:=
ifcnt:=
0
cond
![Page 15: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/15.jpg)
Self-Optimization by Node Specialization
15
cnt := cnt + 1
def UninitVarWrite.execute(frame):val := sub_expr.execute(frame)return specialize(val).execute_evaluated(frame, val)
uninitializedvariable write
cnt
1
+cnt:=
cnt:=
def UninitVarWrite.specialize(val):if val instanceof int:return replace(IntVarWrite(sub_expr))
elif …:…
else:return replace(GenericVarWrite(sub_expr))
specialized
![Page 16: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/16.jpg)
Self-Optimization by Node Specialization
16
cnt := cnt + 1
def IntVarWrite.execute(frame):try:val := sub_expr.execute_int(frame)return execute_eval_int(frame, val)
except ResultExp, e:return respecialize(e.result).
execute_evaluated(frame, e.result)
def IntVarWrite.execute_eval_int(frame, anInt):frame.local_int[idx] := anIntreturn anInt
intvariable write
cnt
1
+cnt:=
![Page 17: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/17.jpg)
Some Possible Self-Optimizations
• Type profiling and specialization
• Value caching
• Inline caching
• Operation inlining
• Library Lowering
17
![Page 18: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/18.jpg)
Library Lowering for Array class
createSomeArray() { return Array.new(1000, ‘fast fast fast’); }
18
class Array {static new(size, lambda) {
return new(size).setAll(lambda);}
setAll(lambda) {forEach((i, v) -> { this[i] = lambda.eval(); });
}}
class Object {eval() { return this; }
}
![Page 19: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/19.jpg)
Optimizing for Object Values
19
createSomeArray() { return Array.new(1000, ‘fast fast fast’); }
.new
Array
global lookup
method invocation
1000
int literal
‘fast’
string literal
Object, but not a lambda
Optimizationpotential
![Page 20: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/20.jpg)
Specialized new(size, lambda)
def UninitArrNew.execute(frame):
size := size_expr.execute(frame)
val := val_expr.execute(frame)
return specialize(size, val).
execute_evaluated(frame, size, val)
20
createSomeArray() { return Array.new(1000, ‘fast fast fast’); }
def UninitArrNew.specialize(size, val):
if val instanceof Lambda:
return replace(StdMethodInvocation())
else:
return replace(ArrNewWithValue())
![Page 21: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/21.jpg)
Specialized new(size, lambda)
def ArrNewWithValue.execute_evaluated(frame, size, val):
return Array([val] * 1000)
21
createSomeArray() { return Array.new(1000, ‘fast fast fast’); }
1 specialized node vs. 1000x `this[i] = lambda.eval()`1000x `eval() { return this; }`
.new
Array
global lookup1000
int literal
‘fast’
string literal
specialized
![Page 22: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/22.jpg)
JUST-IN-TIME COMPILATION FOR
INTERPRETERS
Generating Efficient Native Code
22
![Page 23: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/23.jpg)
How to Get Fast Program Execution?
23
VarWrite.execute(frame)
IntVarWrite.execute(frame)
VarRead.execute(frame)
Literal.execute(frame)
ArrayNewWithValue.execute(frame)
..VW_execute() # bin
..IVW_execute() # bin
..VR_execute() # bin
..L_execute() # bin
..ANWV_execute() # bin
Standard Compilation: 1 node at a time
Minimal Optimization Potential
![Page 24: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/24.jpg)
Problems with Node-by-Node Compilation
24
cnt
1
+cnt:=
Slow Polymorphic Dispatches
def IntVarWrite.execute(frame):try:val := sub_expr.execute_int(frame)return execute_eval_int(frame, val)
except ResultExp, e:return respecialize(e.result).
execute_evaluated(frame, e.result)
cnt:=
Runtime checks in general
![Page 25: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/25.jpg)
Compilation Unit based on User Program
Meta-Tracing Partial Evaluation
Guided By AST
25
cnt
1
+cnt:=
ifcnt:=
0
cnt
1+
cnt:=if cnt:=
0
[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy'sTracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
![Page 26: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/26.jpg)
RPython
Just-in-Time Compilation withMeta Tracing
![Page 27: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/27.jpg)
RPython
• Subset of Python
– Type-inferenced
• Generates VMs
27
Interpretersource
RPythonToolchain
Meta-TracingJIT Compiler
Interpreter
http://rpython.readthedocs.org/
Garbage Collector
…
![Page 28: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/28.jpg)
Meta-Tracing of an Interpreter
28
cnt
1+cnt:=
if
cnt:= 0
[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
![Page 29: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/29.jpg)
Meta Tracers need to know the Loops
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
jit_merge_point(node=self)
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
29
while (cnt < 100) {cnt := cnt + 1;
}
Traceguard(cond_expr == Const(IntLessThan))
![Page 30: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/30.jpg)
Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
30
while (cnt < 100) {cnt := cnt + 1;
}
Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))
![Page 31: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/31.jpg)
Tracing Records one Concrete Execution
class IntVarRead(ASTNode):
final idx
def execute_int(frame):
if frame.is_int(idx):
return frame.local_int[idx]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.execute())
31
while (cnt < 100) {cnt := cnt + 1;
}
Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)
![Page 32: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/32.jpg)
Tracing Records one Concrete Execution
class IntVarRead(ASTNode):
final idx
def execute_int(frame):
if frame.is_int(idx):
return frame.local_int[idx]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.execute())
32
while (cnt < 100) {cnt := cnt + 1;
}
Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))
![Page 33: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/33.jpg)
Tracing Records one Concrete Execution
class IntVarRead(ASTNode):
final idx
def execute_int(frame):
if frame.is_int(idx):
return frame.local_int[idx]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.execute())
33
while (cnt < 100) {cnt := cnt + 1;
}
Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]
![Page 34: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/34.jpg)
Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
34
while (cnt < 100) {cnt := cnt + 1;
}
Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))
![Page 35: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/35.jpg)
Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
35
while (cnt < 100) {cnt := cnt + 1;
}
Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))guard(right_expr == Const(IntLiteral))
![Page 36: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/36.jpg)
Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
36
while (cnt < 100) {cnt := cnt + 1;
}
Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))guard(right_expr == Const(IntLiteral))i5 := right_expr.value # Const(100)
![Page 37: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/37.jpg)
Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
37
while (cnt < 100) {cnt := cnt + 1;
}
Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))guard(right_expr == Const(IntLiteral))i5 := right_expr.value # Const(100)guard_no_exception(Const(UnexpectedResult))
![Page 38: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/38.jpg)
Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
38
while (cnt < 100) {cnt := cnt + 1;
}
Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))guard(right_expr == Const(IntLiteral))i5 := right_expr.value # Const(100)guard_no_exception(Const(UnexpectedResult))b1 := i4 < i5
![Page 39: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/39.jpg)
Tracing Records one Concrete Execution
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
jit_merge_point(node=self)
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
39
while (cnt < 100) {cnt := cnt + 1;
}
Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))guard(right_expr == Const(IntLiteral))i5 := right_expr.value # Const(100)guard_no_exception(Const(UnexpectedResult))b1 := i4 < i5guard_true(b1)
![Page 40: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/40.jpg)
Tracing Records one Concrete Execution
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
jit_merge_point(node=self)
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
40
while (cnt < 100) {cnt := cnt + 1;
}
Traceguard(cond_expr == Const(IntLessThan))guard(left_expr == Const(IntVarRead))i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))guard(right_expr == Const(IntLiteral))i5 := right_expr.value # Const(100)guard_no_exception(Const(UnexpectedResult))b1 := i4 < i5guard_true(b1)...
![Page 41: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/41.jpg)
Traces are Ideal for Optimizationguard(cond_expr ==
Const(IntLessThan))guard(left_expr ==
Const(IntVarRead))
i1 := left_expr.idx # Const(1)a1 := frame.layouti2 := a1[i1] guard(i2 == Const(F_INT))
i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]guard_no_exception(Const(UnexpectedResult))
guard(right_expr ==Const(IntLiteral))
i5 := right_expr.value # Const(100)guard_no_exception(
Const(UnexpectedResult))
b1 := i4 < i5guard_true(b1)
...
i1 := left_expr.idx # Const(1)a1 := frame.layouti1 := a1[Const(1)] guard(i1 == Const(F_INT))
i3 := left_expr.idx # Const(1)a2 := frame.local_inti4 := a2[i3]
i5 := right_expr.value # Const(100)
b1 := i2 < i5guard_true(b1)
...
a1 := frame.layouti1 := a1[1] guard(i1 == F_INT)
a2 := frame.local_inti2 := a2[1]
b1 := i2 < 100guard_true(b1)
...
![Page 42: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/42.jpg)
Truffle + Graal
Just-in-Time Compilation withPartial Evaluation
Oracle Labs
![Page 43: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/43.jpg)
Truffle+Graal
• Java framework
– AST interpreters
• Based on HotSpot
JVM
43
InterpreterGraal Compiler +
Truffle Partial Evaluator
http://www.ssw.uni-linz.ac.at/Research/Projects/JVM/Truffle.htmlhttp://www.oracle.com/technetwork/oracle-labs/program-languages/overview/index-2301583.html
Garbage Collector
…
+ Truffle Framework
HotSpot JVM
![Page 44: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/44.jpg)
Partial Evaluation Guided By AST
44
cnt
1+cnt:=
ifcnt:= 0
[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.
![Page 45: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/45.jpg)
Partial Evaluation inlinesbased on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
45
while (cnt < 100) {cnt := cnt + 1;
}
![Page 46: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/46.jpg)
Partial Evaluation inlinesbased on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
46
while (cnt < 100) {cnt := cnt + 1;
}
class IntLessThan(ASTNode):child left_exprchild right_expr
def execute_bool(frame):try:
left = left_expr.execute_int()except UnexpectedResult r:
...try:
right = right_expr.execute_int()expect UnexpectedResult r:
...return left < right
![Page 47: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/47.jpg)
Partial Evaluation inlinesbased on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
try:
left = cond_expr.left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)47
while (cnt < 100) {cnt := cnt + 1;
}
![Page 48: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/48.jpg)
Partial Evaluation inlinesbased on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
try:
left = cond_expr.left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {cnt := cnt + 1;
}
class IntVarRead(ASTNode):final idx
def execute_int(frame):if frame.is_int(idx):
return frame.local_int[idx]else:
new_node = respecialize()raise UnexpectedResult(new_node.execute
![Page 49: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/49.jpg)
Partial Evaluation inlinesbased on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
try:if frame.is_int(1):left = frame.local_int[1]
else:new_node = respecialize()raise UnexpectedResult(new_node.execute())
except UnexpectedResult r:
...
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {cnt := cnt + 1;
}
![Page 50: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/50.jpg)
Optimize Optimistically
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
try:if frame.is_int(1):left = frame.local_int[1]
else:new_node = respecialize()raise UnexpectedResult(new_node.execute())
except UnexpectedResult r:
...
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {cnt := cnt + 1;
}
![Page 51: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/51.jpg)
Optimize Optimistically
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:if frame.is_int(1):
left = frame.local_int[1]else:
__deopt_return_to_interp()try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {cnt := cnt + 1;
}
![Page 52: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/52.jpg)
Partial Evaluation inlinesbased on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:if frame.is_int(1):
left = frame.local_int[1]else:
__deopt_return_to_interp()try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {cnt := cnt + 1;
}
class IntLiteral(ASTNode):final valuedef execute_int(frame):return value
![Page 53: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/53.jpg)
Partial Evaluation inlinesbased on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:if frame.is_int(1):
left = frame.local_int[1]else:
__deopt_return_to_interp()try:
right = 100
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {cnt := cnt + 1;
}
class IntLiteral(ASTNode):final valuedef execute_int(frame):return value
![Page 54: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/54.jpg)
Classic Optimizations:
Dead Code Eliminationclass WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:if frame.is_int(1):
left = frame.local_int[1]else:
__deopt_return_to_interp()try:
right = 100
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {cnt := cnt + 1;
}
class IntLiteral(ASTNode):final valuedef execute_int(frame):return value
![Page 55: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/55.jpg)
Classic Optimizations:
Constant Propagation
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:if frame.is_int(1):
left = frame.local_int[1]else:
__deopt_return_to_interp()right = 100
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {cnt := cnt + 1;
}
class IntLiteral(ASTNode):final valuedef execute_int(frame):return value
![Page 56: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/56.jpg)
Classic Optimizations:
Loop Invariant Code Motionclass WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:if frame.is_int(1):
left = frame.local_int[1]else:
__deopt_return_to_interp()
if not (left < 100):
break
body_expr.execute(frame)
while (cnt < 100) {cnt := cnt + 1;
}
![Page 57: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/57.jpg)
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
if not frame.is_int(1):__deopt_return_to_interp()
while True:if not (frame.local_int[1] < 100):
break
body_expr.execute(frame)
while (cnt < 100) {cnt := cnt + 1;
}
Classic Optimizations:
Loop Invariant Code Motion
![Page 58: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/58.jpg)
Compilation Unit based on User Program
Meta-Tracing Partial Evaluation
Guided by AST
58
cnt
1
+cnt:=
ifcnt:=
0
cnt
1+
cnt:=if cnt:=
0
[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy'sTracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
![Page 59: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/59.jpg)
WHAT’S POSSIBLE FOR A SIMPLE
INTERPRETER?
Results
59
![Page 60: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/60.jpg)
Designed for Teaching:
• Simple
• Conceptual Clarity
• An Interpreter family
– in C, C++, Java, JavaScript, RPython, Smalltalk
Used in the past by:
http://som-st.github.io
60
![Page 61: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/61.jpg)
Self-Optimizing SOMs
61
SOMMERTruffleSOM
Meta-TracingRPython
SOMPETruffleSOM
Partial Evaluation +Graal Compiler
on the HotSpot JVM
JIT Compiled JIT Compiled
github.com/SOM-st/TruffleSOMgithub.com/SOM-st/RTruffleSOM
![Page 62: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/62.jpg)
Java 8 -server vs. SOM+JITJIT-compiled Peak Performance
62
3.5x slower(min. 1.6x, max. 6.3x)
RPython
2.8x slower(min. 3%, max. 5x)
Truffle+Graal
Compiled
SOMMT
Compiled
SOMPE
●●●
●●●
●●●●●●●●●●●●●●●●●
●●●●
●●
●●
●
●●●●●●●●●●●●●●●●●
●●●
●●●●●●●
●
●●
●●●●
●●●●
●
●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●
●●●●●●
●
●●●●●●●
●
●●●●●●●
●
●
●
●●
●
●
●
●
●●
●
●
●●●●●
●
●●●●●●●●
●
●
●
●●●●●●●●●●
●
●
●
●●
1
4
8
Bo
un
ce
Bu
bble
So
rt
De
lta
Blu
e
Fa
nn
ku
ch
Ma
nd
elb
rot
NB
od
y
Pe
rmu
te
Qu
ee
ns
Qu
ickS
ort
Ric
ha
rds
Sie
ve
Sto
rag
e
Tow
ers
Bo
un
ce
Bu
bble
So
rt
De
lta
Blu
e
Fa
nn
ku
ch
Ma
nd
elb
rot
NB
od
y
Pe
rmu
te
Qu
ee
ns
Qu
ickS
ort
Ric
ha
rds
Sie
ve
Sto
rag
e
Tow
ers
Ru
ntim
e n
orm
aliz
ed
to
Java
(co
mp
iled
or
inte
rpre
ted
)
![Page 63: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/63.jpg)
Implementation: Smaller Than Lua
63
Meta-TracingSOMMT (RTruffleSOM)
Partial EvaluationSOMPE (TruffleSOM)
KLOC: 1000 Lines of Code, without blank lines and comments
4.2
9.8
16
260
525
562
1 10 100 1000
V8 JavaScript
HotSpotJava Virtual Machine
Dart VM
Lua 5.3 interp.
![Page 64: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/64.jpg)
CONCLUSION
64
![Page 65: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/65.jpg)
Simple and Fast Interpreters are Possible!
• Self-optimizing AST interpreters
• RPython or Truffle for JIT Compilation
65
[1] Würthinger et al., Self-Optimizing AST Interpreters, Proc. of the 8th Dynamic Languages Symposium, 2012, pp. 73-82.
[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.
[4] Marr et al., Are We There Yet? Simple Language Implementation Techniques for the 21st Century. IEEE Software 31(5):60—67, 2014
[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
Literature on the ideas:
![Page 66: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/66.jpg)
RPython• #pypy on irc.freenode.net
• rpython.readthedocs.org
• Kermit Example interpreterhttps://bitbucket.org/pypy/example-interpreter
• A Tutorialhttp://morepypy.blogspot.be/2011/04/tutorial-writing-interpreter-with-pypy.html
• Language implementationshttps://www.evernote.com/shard/s130/sh/4d42a591-c540-4516-9911-c5684334bd45/d391564875442656a514f7ece5602210
Truffle• http://mail.openjdk.java.net/
mailman/listinfo/graal-dev• SimpleLanguage interpreter
https://github.com/OracleLabs/GraalVM/tree/master/graal/com.oracle.truffle.sl/src/com/oracle/truffle/sl
• A Tutorialhttp://cesquivias.github.io/blog/2014/10/13/writing-a-language-in-truffle-part-1-a-simple-slow-interpreter/
• Project– http://www.ssw.uni-
linz.ac.at/Research/Projects/JVM/Truffle.html– http://www.oracle.com/technetwork/oracle-
labs/program-languages/overview/index-2301583.html 66
Big Thank You!to both communities,
for help, answering questions, debugging support, etc…!!!
![Page 67: Building High-Performance Language Implementations With Low Effort](https://reader036.vdocuments.site/reader036/viewer/2022062419/55a6887a1a28ab341e8b4702/html5/thumbnails/67.jpg)
Languages: Small, Elegant, and Fast!
67
cnt
1
+cnt:=
ifcnt:=
0
cnt
1+cnt:=
ifcnt:= 0
Compiled
SOMMT
Compiled
SOMPE
●●●
●●●
●●●●●●●●●●●●●●●●●
●●●●
●●
●●
●
●●●●●●●●●●●●●●●●●
●●●
●●●●●●●
●
●●
●●●●
●●●●
●
●●●●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●
●●●●●●
●
●●●●●●●
●
●●●●●●●
●
●
●
●●
●
●
●
●
●●
●
●
●●●●●
●
●●●●●●●●
●
●
●
●●●●●●●●●●
●
●
●
●●
1
4
8B
ou
nce
Bu
bble
So
rt
De
lta
Blu
e
Fa
nn
ku
ch
Ma
nd
elb
rot
NB
od
y
Pe
rmu
te
Qu
ee
ns
Qu
ickS
ort
Ric
ha
rds
Sie
ve
Sto
rag
e
Tow
ers
Bo
un
ce
Bu
bble
So
rt
De
lta
Blu
e
Fa
nn
ku
ch
Ma
nd
elb
rot
NB
od
y
Pe
rmu
te
Qu
ee
ns
Qu
ickS
ort
Ric
ha
rds
Sie
ve
Sto
rag
e
Tow
ers
Ru
ntim
e n
orm
aliz
ed
to
Java
(co
mp
iled
or
inte
rpre
ted
)
3.5x slower(min. 1.6x, max. 6.3x)
4.2 KLOCRPython
2.8x slower(min. 3%, max. 5x)
9.8 KLOCTruffle+Graal
@smarr | http://stefan-marr.de