jvm memory model - yoav abrahami, wix
TRANSCRIPT
JVM
Memory Model
JIT
Anomalies
• How long does it take to count to 100?
• How long does it take to append to a list?
To sort a list?
• How long does it take to append to a
vector? To sort a vector?
Dynamic vs Static Compilation
• Static Compilation
– “ahead-of-time” (AOT) compilation
– Source code -> Native executable
– Compiles before executing
• Dynamic compiler (JIT)
– “just-in-time” (JIT) compilation
– Source -> bytecode -> interpreter -> JITed
– Most of compilation happens during executing
JIT Compilation
• Aggressive optimistic optimizations
– Through extensive usage of profiling info
– Limited budget (CPU, Memory)
– Startup speed may suffer
• The JIT
– Compiles bytecode when needed
– Maybe immediately before execution?
– Maybe never?
JVM JIT Compilation
• Eventually JITs bytecode
– Based on profiling
– After 10,000 cycles, again after 20,000 cycles
• Profiling allows focused code-gen
• Profiling allows better code-gen
– Inline what’s hot
– Loop unrolling, range-check elimination, etc.
– Branch prediction, spill-code-gen, scheduling
JVM JIT Compilation
• JVM applications operate in mixed mode
• Interpreted
– Bytecode-walking
– Artificial stack machine
• Compiled
– Direct native operations
– Native register machine
JVM application utilization
Optimizations in HotSpots JVM
Inlining
int addAll(int max) {
int accum = 0;
for (int i=0; i < max; i++) {
accum = add(accum, i);
}
return accum;
}
int add(int a, int b) {
return a+b;
}
int addAll(int max) {
int accum = 0;
for (int i=0; i < max; i++) {
accum = accum + i;
}
return accum;
}
Loop unrolling
public void foo(int[] arr, int a) {
for (int i=0; i<arr.length; i++) {
arr[i] += a;
}
}
public void foo(int[] arr, int a) {
int limit = arr.length / 4;
for (int i=0; i<limit ; i++){
arr[4*i] += a; arr[4*i+1] += a;
arr[4*i+2] += a; arr[4*i+3] += a;
}
for (int i=limit*4; i<arr.length; i++) {
arr[i] += a;
}
}
Escape Analysis
public int m1() {
Pair p = new Pair(1,2);
return m2(p);
}
public int m2(Pair p) {
return p.first + m3(p);
}
public int m3(Pair p) {
return p.second;
}
// after deep inlining
public int m1() {
Pair p = new Pair(1,2);
return p.first + p.second;
}
// optimized version
public int m1() {
return 3;
}
Monitoring Jit
• Info about compiled methods– -XX:+PrintCompilation
• Info about inlining– -xx:+PrintInlining
– Requires also -XX:+UnlockDiagnosticVMOptions
• Print the assembly code– -XX:+PrintAssembly
– Also requires also -XX:+UnlockDiagnosticVMOptions
– On Mac OS requires adding hsdis-amd64.dylibto the LD_LIBRARY_PATH environment variable.
Challenge
• Rerun the benchmarks, this time using 1. -XX:+PrintCompilation
2. -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
JVM Memory
The Java Memory Model
Java Memory Model
• The Java Memory Model (JMM) describes
how threads in the Java (Scala)
Programming language interact through
memory.
• Provides sequential consistency for data
race free programs.
Instruction Reordering
• Program Orderint a=1;
int b=2;
int c=3;
int d=4;
int e = a + b;
int f = c - d;
• Execution Orderint d=4;
int c=3;
int f = c - d;
int b=2;
int a=1;
int e = a + b;
Anomaly
• Two threads running
• What will be the result?i=1, j=1
i=0, j=1
i=1, j=0
i=0, j=0
x=y=0
j=y
x=1
i=x
y=1
Thread 1 Thread 2
Let’s Check
• Let’s build the scenarioval t1 = new Thread(new Runnable {
def run() {
// sleep a little to add some uncertainty
Thread.sleep(1)
x=1
j=y
}
})
• Then run it a few times
• Do we see the anomaly?
Happens Before Ordering
• Defines constraints on instruction reordering
• A monitor release
• A matching monitor acquire
• Volatile field reads are after writes– For non volatile field, this is not necessarily the
case!
• Assignment dependency within a single thread
• Happens Before ordering is transitive
Anomaly
• Let’s see how far we can count in 100 milli-secondsvar running = true
• Let thread 1 countvar count = 0
while (running)
count = count + 1
println(count)
• Let thread 2 signal thread 1 to stopThread.sleep(100)
running = false
println("thread 2 set running to false”)
Volatile
• Compilers can reorder instructions
• Compilers can keep values in registers
• Processors can reorder instructions
• Values may be in different caching levels
and not synced to main memory
• JMM is designed for aggressive
optimizations
Volatile
• Modern processor caches
Core 1 Core 2 Core 3 Core 4
L1 L1 L1 L1
L2 L2 L2 L2
L3 L3
Main Memory ~65 ns (DRAM)
~15 ns (40-45 cycles)
~3 ns (10-12 cycles)
~1 ns (3-4 cycles)
< 1 ns
Volatile
• Volatile instructs the compiler and processor to sync the value to main memory on every access– Does not utilize the L1, L2 or L3 cache
• Volatile reads / writes cannot be reordered
• Volatile long and doubles are atomic– Long and double types are over 32bit – the
processor operates on 32bit atomicity by default.
Resolve the Anomaly
• Let’s see how far we can count in 100 milli-seconds@volatile var running = true
• Let thread 1 countvar count = 0
while (running)
count = count + 1
println(count)
• Let thread 2 signal thread 1 to stopThread.sleep(100)
running = false
println("thread 2 set running to false”)
Anomaly
• Let’s count to 10,000
• But lets use 10 threads, each adding 1,000 to our count
var count = 0
• Each of the 10 threads doesfor (i <- 1 to 1000)
count = count + 1
• What did we get?
Synchronization
• Let’s have another look at the assignmentcount = count + 1
count = count + 1
• Is this a single instruction?
• javap
– javap <class> - Print the class signature
– javap -c <class> - Print the class bytecode
Synchronization
• The bytecode for count = count + 1
14: getfield #38 // Field scala/runtime/IntRef.elem:I
17: iconst_1
18: iadd
19: putfield #38 // Field scala/runtime/IntRef.elem:I
Synchronization
• The bytecode for count = count + 1
// Read the current counter value from field 38
// and add it to the stack
14: getfield #38 // Field scala/runtime/IntRef.elem:I
// Add 1 to the stack
17: iconst_1
// Add the first two stack elements as integers,
// and put the result in the stack
18: iadd
// set field 38 to the current top element of the stack
// assuming it is an integer
19: putfield #38 // Field scala/runtime/IntRef.elem:I
Synchronization Tools
Actions by
thread 1
Thread 1
“release”
monitor
Thread 2
“acquire”
monitor
Actions by
thread 2
Happens-before
Synchronization Tools
• Synchronization tools allow grouping instructions as if “one atomic instruction”
– Only one thread can perform the code at a time
• Some tools
– Synchronized
– ReentrantLock
– CountDownLatch
– Semaphore
– ReentrantReadWriteLock
Synchronization Tools
• Simplest tools – synchronized// for each thread
for (i <- 1 to 1000)
synchronized {
count = count + 1
}
• Works relative to ‘this’
Synchronization Tools
• Using ReentrantLock// before the threads
val lock = new ReentrantLock()
// for each thread
for (i <- 1 to 1000) {
lock.lock()
try {
count = count + 1
}
finally {
lock.unlock()
}
}
Atomic Operations
• Containers for simple values or references
with atomic operations
• getAndIncrement
• getAndDecrement
• getAndAdd
Atomic Operations
• All are based on compareAndSwap
– From the unsafe class
– Used to implement spin-locks
Atomic Operations
• Spin Lockpublic final int getAndIncrement() {
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next))
return current;
}
}
}
public final boolean compareAndSet(int expect, int update) {
return unsafe.compareAndSwapInt(this,
valueOffset, expect, update);
}
References
• The examples on Github
https://github.com/yoavaa/jvm-memory-model
Questions?