Download - JRuby: The Hard Parts
The Hard Parts
Subverting the JVMAll the tricks, hacks, and kludges we’ve use to make
JRuby the best off-JVM language impl around.
Intro
• Charles Oliver Nutter
• Principal Software Engineer
• Red Hat, JBoss Polyglot Group
• @headius
Welcome!
• My favorite event of the year
• I’ve only missed one!
• I will quickly talk through JRuby challenges
• Not a comprehensive list. Buy me a beer.
• Rest of you can help solve them
Ruby
• Dynamic, object-oriented language
• Created in 90s by Yukihiro Matsumoto
• “matz”
• Matz’s Ruby Interpreter (MRI)
• Inspired by Python, Perl, Lisp, Smalltalk
• Memes: TMTOWTDI, MINASWAN, CoC,
# Output "I love Ruby"!say = "I love Ruby"!puts say!!# Output "I *LOVE* RUBY"!say['love'] = "*love*"!puts say.upcase!!# Output "I *love* Ruby"!# five times!5.times { puts say }!
JRuby
• Ruby for the JVM and JVM for the Ruby
• Started in 2001, dozens of contribs
• Usually the fastest Ruby
• At least 20 paid full-time man years in it
• Sun Microsystems, Engine Yard, Red Hat
Ruby is Hard to Implement!
Making It Go (Fast)
• Parser-generator hacks
• Multiple interpreters
• Multiple compilers
• JVM-specific tricks
Parsing Ruby
• Yacc/Bison-based parse.y, almost 12kloc
• Very complex, not context-free
• No known 100% correct parser that is not YACC-based
JRuby’s Parser
• Jay parser generator
• Maybe 5 projects in the world use it
• Our version of parse.y = 4kloc
• Two pieces, one is for offline parsing
• Works ok, but…
Parser Problems!
• Array initialization > 65k bytecode
• Giant switch won’t JIT
• Outlining the case bodies: better
• Case bodies as runnables in machine: best
• org/jruby/parser/RubyParser$445.class
• Slow at startup (most important time!)
Interpreter
• At least four interpreters we’ve tried
• Original: visitor-based
• Modified: big switch rather than visitor
• Experimental: stackless instr-based
• Current: direct execution of AST
• Execution state on artificial stack
The New Way
• JRuby 9000 introduces a new IR
• Traditional-style compiler IR
• Register-based
• CFG, semantic analysis, type and constant propagation, all that jazz
• Interpreter has proven it out…JIT next
Mixed-Mode
• JRuby has both interpreter and JIT
• Cost of generating JVM bytecode is high
• Our interpreter runs faster than JVM’s
• A jitted interpreter is (much) faster than unjitted bytecode
Native Execution
• Early JIT compiler just translated AST
• Bare-minimum semantic analysis
• Eliminate artificial frame use
• One-off opto for frequent patterns
• Too unwieldy to evolve much
New IR JIT
• Builds off IR runtime
• Per-instruction bytecode gen is simple
• JVM frame is like infinite register machine
• Potential to massively improve perf
• Early unboxing numbers…
Numeric loop performance
0
1.25
2.5
3.75
5
times faster than MRI 2.1JRuby 1.7 Rubinius
Numeric loop performance
0
15
30
45
60
times faster than MRI 2.1JRuby 1.7 Rubinius Truffle Topaz 9k+unbox
mandelbrot(500)
0
10
20
30
40
times faster than MRI 2.1JRuby 9k + indy JRuby 9k + unboxing JRuby 9k + Truffle
Whither Truffle?
• RubyTruffle merged into JRuby
• Same licenses as rest of JRuby
• Chris Seaton continues to work on it
• Very impressive peak numbers
• Startup, steady-state…needs work
• Considering initial use for targeted opto
JVM Tricks
• Lack of class hierarchy analysis in JIT
• Manually split methods to beat limits
• Everything is an expression, so exception-handling has to maintain current stack
• Tweaking JIT flags will just make you sad
• Unsafe
IRubyObject public RubyClass getMetaClass();
RubyBasicObject private RubyClass metaClass; public RubyClass getMetaClass() { return metaClass; }
RubyString RubyArray RubyObject
obj.getMetaClass()
public static RubyClass metaclass(IRubyObject object) { return object instanceof RubyBasicObject ? ((RubyBasicObject)object).getMetaClass() : object.getMetaClass();}
Compatibility
• Strings and Encodings
• IO
• Fibers
• Difficult choices
Strings
• All arbitrary-width byte data is String
• Binary data and encoded text alike
• Many supported encodings
• j.l.String, char[] poor options
• Size, data integrity, behavioral differences
The First Big Decision
• We realized we needed a byte[] String
• Had been StringBuilder-based until then
• That meant a lot of porting…
• Regex engine (joni)
• Encoding subsystem (jcodings)
• Low-level IO + transcoding (in JRuby)
JOni
• Port of Oniguruma regex library
• Pluggable grammars + arbitrary encodings
• Bytecode engine (shallow call stack)
• Interruptible
• Re-forked as char[] engine for Nashorn
• https://github.com/jruby/joni
Data: ‘a’-‘z’ in byte[] Match /.*tuv(..)yz$/
0s
1.5s
3s
4.5s
6s
j.u.regex JOni
Data: ‘a’-‘z’ from IO Match /.*tuv(..)yz$/
0s
0.7s
1.4s
2.1s
2.8s
j.u.regex JOni
Jcodings
• Character tables
• Used heavily by JOni and JRuby
• Transcoding tables and logic
• Replaces Charset logic from JRuby 1.7
• https://github.com/jruby/jcodings
NO GRAPH NEEDED
JRuby 9000
• Finished porting, connecting transcoders
• New port of IO operations
• Transcoding works directly against IO buffers; hard to simulate other ways
• Lots of fun native (C) calls to emulate…
Fibers
• Coroutines, goroutines, continuations
• MRI uses stack-swapping
• And limits Fiber stack size as a result
• Useless as a concurrency model
• Useful for multiplexing operations
• Try read, no data, go to next fiber
Fibers on JRuby
• Yep, they’re just native threads
• Transfer perf with j.u.c utils is pretty close
• Resource load is very bad
• Spin-up time is bad without thread pool
• So early or occasional fibers cost a lot
• Where are you, coro?!
Hard Decisions
• ObjectSpace walks heap, off by default
• Trace functions add overhead, off by default
• Full coroutines not possible
• C extension API too difficult to emulate
• Perhaps only item to really hurt us
Native Integration
• Process control
• More selectable IO
• FFI layer
• C extension API
• Misc
Ruby’s Roots
• Matz is/was a C programmer
• Early Ruby did little more than stitch C calls together
• Some of those roots remain
• ttys, fcntl, process control, IO, ext API
• We knew we needed a solution
JNA, and then JNR
• Started with jna-posix to map POSIX
• stat, symlink, etc needed to do basics
• JNR replaced JNA
• Wayne Meissner started his empire…
The Cancer
• Many off-platform runtimes are not as good as Hotspot
• Many of their users must turn to C for perf
• So, since many people use C exts on MRI, maybe we need to implement it?
• Or get a student to do it…
MRI C Extensions
• Very invasive API
• Direct pointer access, object internals, conservative GC, threading constraints
• Like bridging one JNI to another
• Experimental in JRuby 1.6, gone in 1.7
• Will not revisit unless new API
FFI
• Ruby API/DSL for binding C libs
• Additional tools for generating that code
• If you need to go native, it’s the best way
• In use in production JRuby apps
• ØMQ client, bson lib, sodium crypto, …
Ruby FFI exampleclass Timeval < FFI::Struct! layout :tv_sec => :ulong,! :tv_usec => :ulong!end!!module LibC! extend FFI::Library! ffi_lib FFI::Library::LIBC! attach_function :gettimeofday,! [ :pointer, :pointer ],! :int!end!!t = Timeval.new!LibC.gettimeofday(t.pointer, nil)
Layered Runtime
jffi
jnr-ffi
libffi
jnr-posix
jnr-constants
!
jnr-enxio jnr-x86asmjnr-unixsocket
etc etc
Native in JRuby
• POSIX stuff missing from Java
• Ruby FFI DSL for binding C libs
• Stdio
• selection, remove buffering, control tty
• Process launching and control
• !!!!!!
Process Control
• Java’s ProcessBuilder/Process are bad
• No channel access (no select!)
• Spins up at least one thread per process
• Drains child output ahead of you
• New process API based on posix_spawn
in_c, in_p = IO.pipe out_p, out_c = IO.pipe !pid = spawn('cat -n', :in => in_c, :out => out_c, :err => 'error.log') ![in_c, out_c].each(&:close) !in_p.puts("hello, world") in_p.close !puts out_p.read # => " 1 hello, world" !Process.waitpid(pid)
Usability
• Backtraces
• Command-line and launchers
• Startup time
Backtraces
• JVM backtraces make Rubyists’ eyes bleed
• Initially, Ruby trace maintained manually
• JIT emits mangled class to produce a Ruby trace element
• AOT produces single class, mangled method name
• Mixed-mode backtraces!
at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:86) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:234) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1061) at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:910) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:892) at groovy.lang.Closure.call(Closure.java:279) at org.codehaus.groovy.runtime.DefaultGroovyMethods.callClosureForMapEntry(DefaultGroovyMethods.java:1911) at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1184) at org.codehaus.groovy.runtime.dgm$88.invoke(Unknown Source) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:270) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:52) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:124) at BootStrap.populateBootstrapData(BootStrap.groovy:786) at BootStrap.this$2$populateBootstrapData(BootStrap.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:86) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:234) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1061) at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:910) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:892) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1009) at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:910) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:892) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.jav
at org.jruby.javasupport.JavaMethod.invokeStaticDirect(JavaMethod.java:362) at org.jruby.java.invokers.StaticMethodInvoker.call(StaticMethodInvoker.java:50) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:306) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:136) at org.jruby.ast.CallNoArgNode.interpret(CallNoArgNode.java:60) at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105) at org.jruby.ast.RootNode.interpret(RootNode.java:129) at org.jruby.evaluator.ASTInterpreter.INTERPRET_EVAL(ASTInterpreter.java:95) at org.jruby.evaluator.ASTInterpreter.evalWithBinding(ASTInterpreter.java:184) at org.jruby.RubyKernel.evalCommon(RubyKernel.java:1158) at org.jruby.RubyKernel.eval19(RubyKernel.java:1121) at org.jruby.RubyKernel$INVOKER$s$0$3$eval19.call(RubyKernel$INVOKER$s$0$3$eval19.gen) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:210) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:206) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:155) at ruby.__dash_e__.method__1$RUBY$bar(-e:1) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:138) at ruby.__dash_e__.block_0$RUBY$foo(-e:1) at ruby$__dash_e__$block_0$RUBY$foo.call(ruby$__dash_e__$block_0$RUBY$foo) at org.jruby.runtime.CompiledBlock19.yieldSpecificInternal(CompiledBlock19.java:117) at org.jruby.runtime.CompiledBlock19.yieldSpecific(CompiledBlock19.java:92) at org.jruby.runtime.Block.yieldSpecific(Block.java:111) at org.jruby.RubyFixnum.times(RubyFixnum.java:275) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:230) at ruby.__dash_e__.method__0$RUBY$foo(-e:1) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:138) at ruby.__dash_e__.__file__(-e:1) at ruby.__dash_e__.load(-e)
• org.jruby.RubyFixnum.times
• org.jruby.evaluator.ASTInterpreter.INTERPRET_EVAL
• rubyjit.Object$$foo_3AB1F5052668B3CD74A0B4CD4999CF6A65E92973271627940.__file__
• ruby.__dash_e__.method__0$RUBY$foo
Command Line
• Rubyists typically are at CLI
• Command line and tty must behave
• Epic bash and .bat scripts
• 300-500 lines of heinous shell script
• Unusable in shebang lines
• Repurposed NetBeans native launcher
system ~/projects/jruby $ time bin/jruby.bash -vjruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64]!real0m0.126suser0m0.092ssys 0m0.031s!system ~/projects/jruby $ time bin/jruby.bash -vjruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64]!real0m0.124suser0m0.089ssys 0m0.033s!system ~/projects/jruby $ time jruby -vjruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64]!real0m0.106suser0m0.080ssys 0m0.022s!system ~/projects/jruby $ time jruby -vjruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64]!real0m0.110suser0m0.085ssys 0m0.023s
Console Support
• Rubyists also typically use REPLs
• Readline support is a must
• jline has been forked all over the place
• Looking into JNA-based readline now
CLI == Startup Time
• BY FAR the #1 complaint
• May be the only reason we haven’t won!
• We’re trying everything we can
JRuby Startup
-e 1
gem --help
rake -T
Time in seconds (lower is better)
0 2.5 5 7.5 10
C Ruby JRuby
Tweaking Flags
• -client mode
• -XX:+TieredCompilation -XX:TieredStopAtLevel=1
• -X-C to disable JRuby’s compiler
• Heap sizes, code verification, etc etc
Nailgun?
• Keep a single JVM running in background
• Toss commands over to it
• It stays hot, so code starts faster
• Hard to clean up all state (e.g. threads)
• Can’t get access to user’s terminal
• http://www.martiansoftware.com/nailgun/
DripIsolated JVM
ApplicationCommand #1
Isolated JVM
ApplicationCommand #1
Isolated JVM
ApplicationCommand #1
Drip
• Start a new JVM after each command
• Pre-boot JVM plus optional code
• Analyze command line for differences
• Age out unused instances
• https://github.com/flatland/drip
Drip Init
• Give Drip some code to pre-boot
• Load more libraries
• Warm up some code
• Pre-execution initialization
• Run as much as possible in background
• We also pre-load ./dripmain.rb if exists
$ cat dripmain.rb# Preload some code Rails always needsrequire File.expand_path('../config/application', __FILE__)
JRuby Startup
rake -T
Time in seconds (lower is better)
0 2.5 5 7.5 10
C Ruby JRuby JRuby (best)JRuby (drip) JRuby (drip init) JRuby (dripmain)
CONCLUSION
Hard Parts• 64k bytecode limit
• Falling over JIT limits
• String char[] pain
• Startup and warmup
• Coroutines
• FFI at JVM level
• Too many flags
• Tiered compiler slow
• Interpreter opto
• Bytecode is a blunt tool
• Indy has taken too long
• Charlie may burn out
Thank You!
• Charles Oliver Nutter
• @headius
• http://blog.headius.com