Transcript
Page 1: JRuby: The Hard Parts

The Hard Parts

Page 2: JRuby: The Hard Parts

Subverting the JVMAll the tricks, hacks, and kludges we’ve use to make

JRuby the best off-JVM language impl around.

Page 3: JRuby: The Hard Parts

Intro

• Charles Oliver Nutter

• Principal Software Engineer

• Red Hat, JBoss Polyglot Group

• @headius

[email protected]

Page 4: JRuby: The Hard Parts

Welcome!

• My favorite event of the year

• I’ve only missed one!

• I will quickly talk through JRuby challenges

• Not a comprehensive list. Buy me a beer.

• Rest of you can help solve them

Page 5: JRuby: The Hard Parts

Ruby

• Dynamic, object-oriented language

• Created in 90s by Yukihiro Matsumoto

• “matz”

• Matz’s Ruby Interpreter (MRI)

• Inspired by Python, Perl, Lisp, Smalltalk

• Memes: TMTOWTDI, MINASWAN, CoC,

Page 6: JRuby: The Hard Parts

# Output "I love Ruby"!say = "I love Ruby"!puts say!!# Output "I *LOVE* RUBY"!say['love'] = "*love*"!puts say.upcase!!# Output "I *love* Ruby"!# five times!5.times { puts say }!

Page 7: JRuby: The Hard Parts

JRuby

• Ruby for the JVM and JVM for the Ruby

• Started in 2001, dozens of contribs

• Usually the fastest Ruby

• At least 20 paid full-time man years in it

• Sun Microsystems, Engine Yard, Red Hat

Page 8: JRuby: The Hard Parts

Ruby is Hard to Implement!

Page 9: JRuby: The Hard Parts

Making It Go (Fast)

• Parser-generator hacks

• Multiple interpreters

• Multiple compilers

• JVM-specific tricks

Page 10: JRuby: The Hard Parts

Parsing Ruby

• Yacc/Bison-based parse.y, almost 12kloc

• Very complex, not context-free

• No known 100% correct parser that is not YACC-based

Page 11: JRuby: The Hard Parts
Page 12: JRuby: The Hard Parts
Page 13: JRuby: The Hard Parts
Page 14: JRuby: The Hard Parts

JRuby’s Parser

• Jay parser generator

• Maybe 5 projects in the world use it

• Our version of parse.y = 4kloc

• Two pieces, one is for offline parsing

• Works ok, but…

Page 15: JRuby: The Hard Parts

Parser Problems!

• Array initialization > 65k bytecode

• Giant switch won’t JIT

• Outlining the case bodies: better

• Case bodies as runnables in machine: best

• org/jruby/parser/RubyParser$445.class

• Slow at startup (most important time!)

Page 16: JRuby: The Hard Parts

Interpreter

• At least four interpreters we’ve tried

• Original: visitor-based

• Modified: big switch rather than visitor

• Experimental: stackless instr-based

• Current: direct execution of AST

• Execution state on artificial stack

Page 17: JRuby: The Hard Parts

The New Way

• JRuby 9000 introduces a new IR

• Traditional-style compiler IR

• Register-based

• CFG, semantic analysis, type and constant propagation, all that jazz

• Interpreter has proven it out…JIT next

Page 18: JRuby: The Hard Parts

Mixed-Mode

• JRuby has both interpreter and JIT

• Cost of generating JVM bytecode is high

• Our interpreter runs faster than JVM’s

• A jitted interpreter is (much) faster than unjitted bytecode

Page 19: JRuby: The Hard Parts

Native Execution

• Early JIT compiler just translated AST

• Bare-minimum semantic analysis

• Eliminate artificial frame use

• One-off opto for frequent patterns

• Too unwieldy to evolve much

Page 20: JRuby: The Hard Parts

New IR JIT

• Builds off IR runtime

• Per-instruction bytecode gen is simple

• JVM frame is like infinite register machine

• Potential to massively improve perf

• Early unboxing numbers…

Page 21: JRuby: The Hard Parts

Numeric loop performance

0

1.25

2.5

3.75

5

times faster than MRI 2.1JRuby 1.7 Rubinius

Page 22: JRuby: The Hard Parts

Numeric loop performance

0

15

30

45

60

times faster than MRI 2.1JRuby 1.7 Rubinius Truffle Topaz 9k+unbox

Page 23: JRuby: The Hard Parts

mandelbrot(500)

0

10

20

30

40

times faster than MRI 2.1JRuby 9k + indy JRuby 9k + unboxing JRuby 9k + Truffle

Page 24: JRuby: The Hard Parts

Whither Truffle?

• RubyTruffle merged into JRuby

• Same licenses as rest of JRuby

• Chris Seaton continues to work on it

• Very impressive peak numbers

• Startup, steady-state…needs work

• Considering initial use for targeted opto

Page 25: JRuby: The Hard Parts

JVM Tricks

• Lack of class hierarchy analysis in JIT

• Manually split methods to beat limits

• Everything is an expression, so exception-handling has to maintain current stack

• Tweaking JIT flags will just make you sad

• Unsafe

Page 26: JRuby: The Hard Parts

IRubyObject public RubyClass getMetaClass();

RubyBasicObject private RubyClass metaClass; public RubyClass getMetaClass() { return metaClass; }

RubyString RubyArray RubyObject

obj.getMetaClass()

Page 27: JRuby: The Hard Parts

public static RubyClass metaclass(IRubyObject object) { return object instanceof RubyBasicObject ? ((RubyBasicObject)object).getMetaClass() : object.getMetaClass();}

Page 28: JRuby: The Hard Parts

Compatibility

• Strings and Encodings

• IO

• Fibers

• Difficult choices

Page 29: JRuby: The Hard Parts

Strings

• All arbitrary-width byte data is String

• Binary data and encoded text alike

• Many supported encodings

• j.l.String, char[] poor options

• Size, data integrity, behavioral differences

Page 30: JRuby: The Hard Parts

The First Big Decision

• We realized we needed a byte[] String

• Had been StringBuilder-based until then

• That meant a lot of porting…

• Regex engine (joni)

• Encoding subsystem (jcodings)

• Low-level IO + transcoding (in JRuby)

Page 31: JRuby: The Hard Parts

JOni

• Port of Oniguruma regex library

• Pluggable grammars + arbitrary encodings

• Bytecode engine (shallow call stack)

• Interruptible

• Re-forked as char[] engine for Nashorn

• https://github.com/jruby/joni

Page 32: JRuby: The Hard Parts

Data: ‘a’-‘z’ in byte[] Match /.*tuv(..)yz$/

0s

1.5s

3s

4.5s

6s

j.u.regex JOni

Page 33: JRuby: The Hard Parts

Data: ‘a’-‘z’ from IO Match /.*tuv(..)yz$/

0s

0.7s

1.4s

2.1s

2.8s

j.u.regex JOni

Page 34: JRuby: The Hard Parts

Jcodings

• Character tables

• Used heavily by JOni and JRuby

• Transcoding tables and logic

• Replaces Charset logic from JRuby 1.7

• https://github.com/jruby/jcodings

Page 35: JRuby: The Hard Parts

NO GRAPH NEEDED

Page 36: JRuby: The Hard Parts

JRuby 9000

• Finished porting, connecting transcoders

• New port of IO operations

• Transcoding works directly against IO buffers; hard to simulate other ways

• Lots of fun native (C) calls to emulate…

Page 37: JRuby: The Hard Parts

Fibers

• Coroutines, goroutines, continuations

• MRI uses stack-swapping

• And limits Fiber stack size as a result

• Useless as a concurrency model

• Useful for multiplexing operations

• Try read, no data, go to next fiber

Page 38: JRuby: The Hard Parts

Fibers on JRuby

• Yep, they’re just native threads

• Transfer perf with j.u.c utils is pretty close

• Resource load is very bad

• Spin-up time is bad without thread pool

• So early or occasional fibers cost a lot

• Where are you, coro?!

Page 39: JRuby: The Hard Parts

Hard Decisions

• ObjectSpace walks heap, off by default

• Trace functions add overhead, off by default

• Full coroutines not possible

• C extension API too difficult to emulate

• Perhaps only item to really hurt us

Page 40: JRuby: The Hard Parts

Native Integration

• Process control

• More selectable IO

• FFI layer

• C extension API

• Misc

Page 41: JRuby: The Hard Parts

Ruby’s Roots

• Matz is/was a C programmer

• Early Ruby did little more than stitch C calls together

• Some of those roots remain

• ttys, fcntl, process control, IO, ext API

• We knew we needed a solution

Page 42: JRuby: The Hard Parts

JNA, and then JNR

• Started with jna-posix to map POSIX

• stat, symlink, etc needed to do basics

• JNR replaced JNA

• Wayne Meissner started his empire…

Page 43: JRuby: The Hard Parts

The Cancer

• Many off-platform runtimes are not as good as Hotspot

• Many of their users must turn to C for perf

• So, since many people use C exts on MRI, maybe we need to implement it?

• Or get a student to do it…

Page 44: JRuby: The Hard Parts

MRI C Extensions

• Very invasive API

• Direct pointer access, object internals, conservative GC, threading constraints

• Like bridging one JNI to another

• Experimental in JRuby 1.6, gone in 1.7

• Will not revisit unless new API

Page 45: JRuby: The Hard Parts

FFI

• Ruby API/DSL for binding C libs

• Additional tools for generating that code

• If you need to go native, it’s the best way

• In use in production JRuby apps

• ØMQ client, bson lib, sodium crypto, …

Page 46: JRuby: The Hard Parts

Ruby FFI exampleclass Timeval < FFI::Struct!  layout :tv_sec => :ulong,! :tv_usec => :ulong!end!!module LibC!  extend FFI::Library!  ffi_lib FFI::Library::LIBC!  attach_function :gettimeofday,! [ :pointer, :pointer ],! :int!end!!t = Timeval.new!LibC.gettimeofday(t.pointer, nil)

Page 47: JRuby: The Hard Parts

Layered Runtime

jffi

jnr-ffi

libffi

jnr-posix

jnr-constants

!

jnr-enxio jnr-x86asmjnr-unixsocket

etc etc

Page 48: JRuby: The Hard Parts

Native in JRuby

• POSIX stuff missing from Java

• Ruby FFI DSL for binding C libs

• Stdio

• selection, remove buffering, control tty

• Process launching and control

• !!!!!!

Page 49: JRuby: The Hard Parts

Process Control

• Java’s ProcessBuilder/Process are bad

• No channel access (no select!)

• Spins up at least one thread per process

• Drains child output ahead of you

• New process API based on posix_spawn

Page 50: JRuby: The Hard Parts

in_c, in_p = IO.pipe out_p, out_c = IO.pipe !pid = spawn('cat -n', :in => in_c, :out => out_c, :err => 'error.log') ![in_c, out_c].each(&:close) !in_p.puts("hello, world") in_p.close !puts out_p.read # => " 1 hello, world" !Process.waitpid(pid)

Page 51: JRuby: The Hard Parts

Usability

• Backtraces

• Command-line and launchers

• Startup time

Page 52: JRuby: The Hard Parts

Backtraces

• JVM backtraces make Rubyists’ eyes bleed

• Initially, Ruby trace maintained manually

• JIT emits mangled class to produce a Ruby trace element

• AOT produces single class, mangled method name

• Mixed-mode backtraces!

Page 53: JRuby: The Hard Parts

at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:86) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:234) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1061) at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:910) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:892) at groovy.lang.Closure.call(Closure.java:279) at org.codehaus.groovy.runtime.DefaultGroovyMethods.callClosureForMapEntry(DefaultGroovyMethods.java:1911) at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1184) at org.codehaus.groovy.runtime.dgm$88.invoke(Unknown Source) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:270) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:52) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:124) at BootStrap.populateBootstrapData(BootStrap.groovy:786) at BootStrap.this$2$populateBootstrapData(BootStrap.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:86) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:234) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1061) at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:910) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:892) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1009) at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:910) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:892) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.jav

Page 54: JRuby: The Hard Parts

at org.jruby.javasupport.JavaMethod.invokeStaticDirect(JavaMethod.java:362) at org.jruby.java.invokers.StaticMethodInvoker.call(StaticMethodInvoker.java:50) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:306) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:136) at org.jruby.ast.CallNoArgNode.interpret(CallNoArgNode.java:60) at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105) at org.jruby.ast.RootNode.interpret(RootNode.java:129) at org.jruby.evaluator.ASTInterpreter.INTERPRET_EVAL(ASTInterpreter.java:95) at org.jruby.evaluator.ASTInterpreter.evalWithBinding(ASTInterpreter.java:184) at org.jruby.RubyKernel.evalCommon(RubyKernel.java:1158) at org.jruby.RubyKernel.eval19(RubyKernel.java:1121) at org.jruby.RubyKernel$INVOKER$s$0$3$eval19.call(RubyKernel$INVOKER$s$0$3$eval19.gen) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:210) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:206) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:155) at ruby.__dash_e__.method__1$RUBY$bar(-e:1) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:138) at ruby.__dash_e__.block_0$RUBY$foo(-e:1) at ruby$__dash_e__$block_0$RUBY$foo.call(ruby$__dash_e__$block_0$RUBY$foo) at org.jruby.runtime.CompiledBlock19.yieldSpecificInternal(CompiledBlock19.java:117) at org.jruby.runtime.CompiledBlock19.yieldSpecific(CompiledBlock19.java:92) at org.jruby.runtime.Block.yieldSpecific(Block.java:111) at org.jruby.RubyFixnum.times(RubyFixnum.java:275) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:230) at ruby.__dash_e__.method__0$RUBY$foo(-e:1) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:138) at ruby.__dash_e__.__file__(-e:1) at ruby.__dash_e__.load(-e)

Page 55: JRuby: The Hard Parts
Page 56: JRuby: The Hard Parts

• org.jruby.RubyFixnum.times

• org.jruby.evaluator.ASTInterpreter.INTERPRET_EVAL

• rubyjit.Object$$foo_3AB1F5052668B3CD74A0B4CD4999CF6A65E92973271627940.__file__

• ruby.__dash_e__.method__0$RUBY$foo

Page 57: JRuby: The Hard Parts

Command Line

• Rubyists typically are at CLI

• Command line and tty must behave

• Epic bash and .bat scripts

• 300-500 lines of heinous shell script

• Unusable in shebang lines

• Repurposed NetBeans native launcher

Page 58: JRuby: The Hard Parts

system ~/projects/jruby $ time bin/jruby.bash -vjruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64]!real0m0.126suser0m0.092ssys 0m0.031s!system ~/projects/jruby $ time bin/jruby.bash -vjruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64]!real0m0.124suser0m0.089ssys 0m0.033s!system ~/projects/jruby $ time jruby -vjruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64]!real0m0.106suser0m0.080ssys 0m0.022s!system ~/projects/jruby $ time jruby -vjruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64]!real0m0.110suser0m0.085ssys 0m0.023s

Page 59: JRuby: The Hard Parts

Console Support

• Rubyists also typically use REPLs

• Readline support is a must

• jline has been forked all over the place

• Looking into JNA-based readline now

Page 60: JRuby: The Hard Parts

CLI == Startup Time

• BY FAR the #1 complaint

• May be the only reason we haven’t won!

• We’re trying everything we can

Page 61: JRuby: The Hard Parts

JRuby Startup

-e 1

gem --help

rake -T

Time in seconds (lower is better)

0 2.5 5 7.5 10

C Ruby JRuby

Page 62: JRuby: The Hard Parts

Tweaking Flags

• -client mode

• -XX:+TieredCompilation -XX:TieredStopAtLevel=1

• -X-C to disable JRuby’s compiler

• Heap sizes, code verification, etc etc

Page 63: JRuby: The Hard Parts

Nailgun?

• Keep a single JVM running in background

• Toss commands over to it

• It stays hot, so code starts faster

• Hard to clean up all state (e.g. threads)

• Can’t get access to user’s terminal

• http://www.martiansoftware.com/nailgun/

Page 64: JRuby: The Hard Parts

DripIsolated JVM

ApplicationCommand #1

Isolated JVM

ApplicationCommand #1

Isolated JVM

ApplicationCommand #1

Page 65: JRuby: The Hard Parts

Drip

• Start a new JVM after each command

• Pre-boot JVM plus optional code

• Analyze command line for differences

• Age out unused instances

• https://github.com/flatland/drip

Page 66: JRuby: The Hard Parts

Drip Init

• Give Drip some code to pre-boot

• Load more libraries

• Warm up some code

• Pre-execution initialization

• Run as much as possible in background

• We also pre-load ./dripmain.rb if exists

Page 67: JRuby: The Hard Parts

$ cat dripmain.rb# Preload some code Rails always needsrequire File.expand_path('../config/application', __FILE__)

Page 68: JRuby: The Hard Parts

JRuby Startup

rake -T

Time in seconds (lower is better)

0 2.5 5 7.5 10

C Ruby JRuby JRuby (best)JRuby (drip) JRuby (drip init) JRuby (dripmain)

Page 69: JRuby: The Hard Parts

CONCLUSION

Page 70: JRuby: The Hard Parts

Hard Parts• 64k bytecode limit

• Falling over JIT limits

• String char[] pain

• Startup and warmup

• Coroutines

• FFI at JVM level

• Too many flags

• Tiered compiler slow

• Interpreter opto

• Bytecode is a blunt tool

• Indy has taken too long

• Charlie may burn out

Page 71: JRuby: The Hard Parts

Thank You!

• Charles Oliver Nutter

• @headius

[email protected]

• http://blog.headius.com


Top Related