jvm dive for mere mortals
TRANSCRIPT
@jkubrynski / kubrynski.com
JVM DIVE FOR MERE MORTALSJAKUB KUBRYNSKI
[email protected] / @jkubrynski / http://kubrynski.com
$ WHOAMICO-FOUNDER OF DEVSKILLER / CODEARTE
TRAINER AT BOTTEGA
CONFITURA ORGANIZER
DEVOXX.PL PROGRAM COMMITTEE
ACKNOWLEDGEMENTSMARTIN THOMPSON (@MJPT777)
ALEKSEY SHIPILËV (@SHIPILEV)
JAVA VIRTUAL MACHINE
LIFE CYCLEidea -> feature on production
LIFE CYCLEsource -> javac -> bytecode
bytecode -> classloader -> interpreter
interpreter -> JIT -> optimized native code
SOURCE CODEpackage com.random.company.app;
public class StringUtilsHelper
public boolean isEmpty(String str) return str != null && str.length() > 0;
JAVACconverts source code into byte codecheckssimple optimizations
CLASS FILEClassFile u4 magic; // CAFEBABE u2 minor_version; u2 major_version; u2 constant_pool_count; cp_info constant_pool[constant_pool_count1]; u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces[interfaces_count]; u2 fields_count; field_info fields[fields_count]; u2 methods_count; method_info methods[methods_count]; u2 attributes_count; attribute_info attributes[attributes_count];
BYTECODElist of operation codes
$xxd p Test.class ...1b04a0000504ac2a1b0464b600021b68ac...
1b => iload_1 04 => iconst_1 a0 => if_icmpne 7 04 => iconst_1 ac => ireturn 2a => aload_0 1b => iload_1 04 => iconst_1 64 => isub b6 => invokevirtual #5 1b => iload_1 68 => imul ac => ireturn
CLASSLOADERdynamically loads classeshierarchies
Bootstrap classloaderExtension classloaderApplication classloaderCustom classloader
CLASSLOADING PHASESloading -> reads class lelinking
verifying -> veries bytecode correctnesspreparing -> allocates memoryresolving -> links with classes, interfaces, elds, methods
initializing -> static initializers
INTERPRETERtemplate interpreterdetects the critical hot spots in the program
JITJust-In-Timeoptimizes codecompiles methods into native code-client (C1) / -server (C2)runs up to 20 times faster
INLININGpublic String getStringFromSupplier(Supplier<String> supplier) return supplier.get();
public String businessMethod(String param) Supplier<String> stringSupplier = new StringSupplier(”my” + param); return getStringFromSupplier(stringSupplier);
// turns to
public String businessMethod(String param) Supplier<String> stringSupplier = new StringSupplier(”my” + param); return stringSupplier.get();
UNROLLINGprivate static String[] options = "yes", "no", "true", "false"
public void someMethod() for (String opt : options) process(opt);
//turns into
public void someMethod() process("yes"); process("no"); process("true"); process("false");
SCALAR REPLACEMENTpublic record(int x, int y) Point point = new Point(x, y); storePoint(point);
// inlining
public record(int x, int y) Point point = new Point(x, y); events.store("Added point", point.x, point.y);
// scalar replacement
public record(int x, int y) events.store("Added point", x, y);
DEAD CODE ELIMINATIONpublic void myMethod() for (int i = 0; i < THRESHOLD; i++) new String("test");
// turns into
public void myMethod()
LOCK ELISIONpublic void process(List<User> users) List<User> result = new ArrayList<>(); synchronized(result) fillResult(users);
//turns into
public void process(List<User> users) List<User> result = new ArrayList<>(); fillResult(users);
TYPE SHARPENINGList<User> users = new ArrayList<>();
// turns into
ArrayList<User> users = new ArrayList<>();
ON STACK REPLACEMENThappens when the interpreter discovers that a method is loopingconverts an interpreted stack frame into a native compiled stackframe
TIERED COMPILATIONLEVELS
0: Interpreted code1: Simple C1 compiled code2: Limited C1 compiled code3: Full C1 compiled code4: C2 compiled code
WHY SHOULD I CARE?JIT does most of the optimizations we could do manually without"obfuscating" source codePerformance/load tests should run only on "hot" application
HOW TO TRACK?When after restarting your app is at the full speed?
$ jstat compiler <PID> 1s
// or
XX:+PrintCompilation
EXECUTION COMPONENTSprogram counterframestack
STACK TRACE"main@1" prio=5 tid=0x1 nid=NA runnable java.lang.Thread.State: RUNNABLE at io.codearte.BlockBuilder.startBlock(BlockBuilder.groovy:21) at io.codearte.Generator.process(Generator.java:318) at io.codearte.ImportantApp.do(ImportantApp.java:64) at sun.reflect.NativeMethodImpl.invoke(NativeMethodImpl.java:18) at sun.reflect.NativeMethodImpl.invoke(NativeMethodImpl.java:62) at java.lang.reflect.Method.invoke(Method.java:497)
DEBUGGING
DEBUGGING
MEMORY LAYOUT
OBJECT LAYOUTcom.eshop.model.Product object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 4 int Product.id N/A 16 4 String Product.name N/A 20 4 (loss due to the next object alignment) Instance size: 24 bytes (estimated, the sample instance is not available) Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
OBJECT LAYOUTcom.eshop.model.Product object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 4 int Product.id N/A 16 4 int Product.price N/A 20 4 String Product.name N/A Instance size: 24 bytes (estimated, the sample instance is not available) Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
OBJECT LAYOUTcom.eshop.model.Product object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 4 int Product.id N/A 16 4 int Product.price N/A 20 1 boolean Product.available N/A 21 3 (alignment/padding gap) N/A 24 4 String Product.name N/A 28 4 (loss due to the next object alignment) Instance size: 32 bytes (estimated, the sample instance is not available) Space losses: 3 bytes internal + 4 bytes external = 7 bytes total
OBJECT LAYOUTcom.eshop.model.Product object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 16 (object header) N/A 16 4 int Product.id N/A 20 4 int Product.price N/A 24 1 boolean Product.available N/A 25 7 (alignment/padding gap) N/A 32 8 String Product.name N/A Instance size: 40 bytes (estimated, the sample instance is not available) Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
GARBAGE COLLECTORcleans memoryimportant performance factorvector algorithmstop the world in safepoints
GC ALGORITHMSSerialParallelConcurrent Mark SweepG1
GENERICSLIST<PILOT>
GENERICSSTARRING: TYPE ERASURE
GENERICSGENERICS LOVE DECLARATIONS
* EXCEPT LOCAL VARIABLESclass Pilots implements List<Pilot> ... // generics class Pilots extends ArrayList<Pilot> ... // generics List<Pilot> field; // generics List<Pilot> getPilots() ... // generics void getPilots(List<Pilots> pilots) ... // generics
field = new ArrayList<>(); // no generics :(
field = new ArrayList<>() ; // generics :P
GENERICSclass Pilots extends ArrayList<Pilot> ... Pilots.class.getGenericSuperclass() // returns java.util.ArrayList<Pilot>
List<Pilot> field; MyClass.class.getDeclaredField("field").getGenericType() // returns java.util.List<Pilot>
List<Pilot> field = new ArrayList<>(); field.getClass().getGenericSuperclass() // returns java.util.AbstractList<E>
List<Pilot> field = new ArrayList<Pilot>() ; field.getClass().getGenericSuperclass() // returns java.util.ArrayList<Pilot>
DISPATCH TYPESinvokevirtualinvokestaticinvokespecialinvokedynamic
LAMBDASgenerated by javacbootstraped by LambdaMetafactorycalled with invokedynamic
LAMBDA UNDER THE HOODBigDecimal sumCreditEntries(Client client) return sumEntries(client.getAccounts(), account > account.getCreditEntries());
private static java.util.List lambda$sumCreditEntries$0(com.sandbox.Account);
private Period period; BigDecimal sumCreditEntries(Client client) return sumEntries(client.getAccounts(), account > account.getCreditEntries(period));
private java.util.List lambda$sumCreditEntries$0(com.sandbox.Account);
BigDecimal sumCreditEntries(Client client, Period period) return sumEntries(client.getAccounts(), account > account.credit(period));
private static java.util.List lambda$sumCreditEntries$0 (java.time.Period, com.sandbox.Account);
METHOD REFERENCESIMILAR TO LAMBDAS, BUT NO NEED TO GENERATE A METHOD
BECAUSE WE'RE CALLING A METHOD
BENCHMARKSCallTypes.baseline avgt 30 4.163 ± 0.009 ns/op CallTypes.lambda avgt 30 4.174 ± 0.015 ns/op CallTypes.methodRef avgt 30 4.244 ± 0.049 ns/op
CallTypesExternal.baseline avgt 30 50.055 ± 0.275 ns/op CallTypesExternal.lambda avgt 30 50.980 ± 0.650 ns/op CallTypesExternal.methodRef avgt 30 50.655 ± 0.376 ns/op
METHODHANDLES
METHODHANDLESReplacement for reectionReection does access control during invocation whileMethodHandle checks with lookup
EXAMPLEMethodHandle toUpperCase = MethodHandles.lookup() .findVirtual(String.class, "toUpperCase", MethodType.methodType(String.class))
Object result = toUpperCase.invoke("test")); String result = (String) toUpperCase.invokeExact("test"));
BENCHMARKSBenchmark Mode Cnt Score Error Units baseline avgt 30 198.993 ± 0.156 ns/op handleExactWithoutLookup avgt 30 208.354 ± 0.675 ns/op handleWithoutLookup avgt 30 209.902 ± 0.331 ns/op reflectWithoutLookup avgt 30 213.322 ± 0.430 ns/op
handleWithLookup avgt 30 4306.501 ± 245.989 ns/op reflectWithLookup avgt 30 748.601 ± 2.566 ns/op
1ns = 0.000 001 ms = 0.000 000 001 s
STREAMSAPI for collection processingsplits implementation and business logicdoesn't store elements -> it's just a pipelinelaziness gives space for optimizations
PERFORMANCEstrings.map(String::toLowerCase) .filter(s > s.charAt(5) > 5) .map(s > s.substring(6, 12)) .collect(toList())
EACH STRING IS AROUND 24 CHARS
PERFORMANCE-Xmx512m
Benchmark (size) Mode Cnt Score Error Units for 100000 avgt 30 5946.020 ± 60.100 µs/op stream 100000 avgt 30 6647.524 ± 123.752 µs/op parallelStream 100000 avgt 30 2486.218 ± 49.030 µs/op
for 1000000 avgt 30 103638.567 ± 3367.418 µs/op stream 1000000 avgt 30 108666.331 ± 2759.447 µs/op parallelStream 1000000 avgt 30 139446.551 ± 5978.815 µs/op
for 1500000 avgt 30 340931.876 ± 32919.570 µs/op stream 1500000 avgt 30 340603.189 ± 22086.747 µs/op parallelStream 1500000 avgt 30 507793.070 ± 95685.964 µs/op
for 2000000 avgt 10 694607.055 ± 50240.340 µs/op stream 2000000 avgt 30 686536.389 ± 20536.336 µs/op parallelStream 2000000 OutOfMemoryError: GC overhead limit exceeded
GC OVERHEAD-Xmx512m gc.alloc.rate.norm
Benchmark (size) Mode Cnt Score Error Units for 100000 avgt 30 6896.776 ± 0.029 KB/op stream 100000 avgt 30 6897.174 ± 0.462 KB/op parallelStream 100000 avgt 30 10232.720 ± 0.005 KB/op
for 1000000 avgt 30 70745.169 ± 0.321 KB/op stream 1000000 avgt 30 70745.585 ± 0.388 KB/op parallelStream 1000000 avgt 30 98321.253 ± 0.994 KB/op
for 1500000 avgt 30 106122.045 ± 2.760 KB/op stream 1500000 avgt 30 106122.462 ± 2.583 KB/op parallelStream 1500000 avgt 30 147476.576 ± 23.135 KB/op
for 2000000 avgt 10 145153.644 ± 5.284 KB/op stream 2000000 avgt 30 145154.058 ± 2.427 KB/op parallelStream 2000000 OutOfMemoryError: GC overhead limit exceeded
IGNORE THE MEMORY-Xmx4g
Benchmark (size) Mode Cnt Score Error Units for 100000 avgt 30 23.966 ± 1.246 ms/op stream 100000 avgt 30 24.838 ± 1.274 ms/op parallelStream 100000 avgt 30 7.096 ± 0.131 ms/op
for 1000000 avgt 30 250.654 ± 8.956 ms/op stream 1000000 avgt 30 260.075 ± 7.867 ms/op parallelStream 1000000 avgt 30 76.781 ± 2.910 ms/op
for 2000000 avgt 30 533.450 ± 28.502 ms/op stream 2000000 avgt 30 554.711 ± 38.503 ms/op parallelStream 2000000 avgt 30 165.757 ± 9.707 ms/op
STREAMS SUMMARYstreams are cleaner and more readable than loopingserial streams have similar performance and overhead to manualloopingparallel streams are really fastparallel streams bring bigger memory overhead due to storingpartial resultsparallel streams always use commonPool (we can hack to use own)
EXCEPTIONSpublic class ClientAlreadyExistsException extends Throwable
EXCEPTIONSBenchmark Mode Cnt Score Error Units Exceptions.standardExcept avgt 30 1029.919 ± 5.026 ns/op Exceptions.standardExceptDeep avgt 30 1121.771 ± 6.615 ns/op
DEEP MEANS THERE ARE 4 MORE FRAMES
EXCEPTIONSpublic class ClientAlreadyExistsException extends Throwable
@Override public synchronized Throwable fillInStackTrace() return this;
EXCEPTIONSBenchmark Mode Cnt Score Error Units Exceptions.standardExcept avgt 30 1029.919 ± 5.026 ns/op Exceptions.standardExceptDeep avgt 30 1121.771 ± 6.615 ns/op Exceptions.stacklessExcept avgt 30 18.827 ± 0.066 ns/op Exceptions.stacklessExceptDeep avgt 30 19.835 ± 0.053 ns/op
DEEP MEANS THERE ARE 4 MORE FRAMES
FURTHER READINGOptimizing Java - Benjamin J Evans, James Gough
The Well-Grounded Java Developer - Benjamin J. Evans, MartijnVerburg
Java Performance - Charlie Hunt, Binu John
Java Performance: The Denitive Guide - Scott Oaks
TOOLSjdkVisual VMMission ControlJProlerHonest ProlerJava Object Layout
I WANT MORE!THE JAVA® VIRTUAL MACHINE SPECIFICATION
HG CLONE HTTP://HG.OPENJDK.JAVA.NET/JDK8/JDK8/
HTTP://OPENJDK.JAVA.NET/PROJECTS/CODE-TOOLS/JMH
BENCHMARKSHTTPS://GITHUB.COM/JKUBRYNSKI/JVM-DIVE-BENCHMARKS
QUESTIONS?
THANKS!