concurrency on the jvm
DESCRIPTION
Concurrency on the JVM showing the nuts and bolts of Akka (I presume .. it's not first-hand stuff I'm saying, just speculating). Java Memory Model, Thread Pools, Actors and the likes of that will be covered.TRANSCRIPT
![Page 1: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/1.jpg)
Concurrency on the JVM.. or some of the nuts and bolts of Akka
Bernhard Huemerbhuemer.at@bhuemer23 July 2013
IRIAN Solutionsirian.at
![Page 2: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/2.jpg)
Agenda• In general, a (random) selection of (more or less loosely coupled)
points I would like to address
• Low-level concurrency - only once you understand the complexity will you appreciate the solution :)
• Thread pools, contention issues around them and the enlightened path to Akka
• What’s missing: A lot. Software-transactional memory (Clojure), data-flow concurrency, futures and more theory I wanted to cover
![Page 3: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/3.jpg)
We’ll focus on utilisation
• “The number of idle cores on my machine doubles every two years” - Sander Mak (DZone interview)
• Distinction between low latency (produce one answer fast) and high throughput (produce lots of answers fast) somewhat fuzzy anyway
![Page 4: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/4.jpg)
• Locks are not expensive, lock contention is - don’t shoot the messenger! Does contention on junctions arise because of traffic lights or because of bad traffic planning?
• Most locking in Java programs is not only uncontended, but also unshared
• Rule of thumb: Think about contention first, and only then worry about your locking.
See also: Brian Goetz, “Threading lightly, Part 1: Synchronization is not the enemy” http://www.ibm.com/developerworks/library/j-threads1/index.html
Note: When benchmarking your application, don’t just deliberately provoke contention when it wouldn’t arise otherwise!
Synchronisation is not the enemy
![Page 5: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/5.jpg)
Synchronised on the JVM
• Optimised for the uncontended case (i.e. the usual one) - can be handled entirely within the JVM (i.e. no OS calls)
• Lightweight locking based on CAS instructions
Example: Implementation of thin locks on IBM’s
version of the JDK 1.1.2 for AIX (yes, yes, .. totally outdated, but you get the idea ..)
See also: http://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon98Thin.pdf
![Page 6: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/6.jpg)
Locking benchmarks (1)
Shamelessly taken from: http://www.ibm.com/developerworks/library/j-jtp11234/
Not the code used in the benchmarks(!) - this is just to illustrate it (and to show off my uber non-blocking locking skills*)
* You are allowed to keep any bugs you find at your discretion.
![Page 7: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/7.jpg)
Locking benchmarks (2)
See also: Brian Goetz, “Java Concurrency in Practice”, Chapter 15
Uncontended version
Contended version
“With low to moderate contention, atomics offer better scalability; with high contention, locks offer better contention avoidance.” - Brian Goetz
• Think roundabouts vs. traffic lights
• Benchmark is deceptive as it produces an unusually high amount of contention. Atomics scale quite nicely in reality.
• Actual lesson learned: Always measure yourself before you assume anything! There are no general performance advices.
0
0.3
0.6
0.9
1.2
2 4 8 16 32 64
ReentrantLockAtomicInteger
0
0.75
1.5
2.25
3
2 4 8 16 32 64
ReentrantLockAtomicInteger
Note: Graph is not based on values I measured, it’s from JCIP .. and I didn’t use a rulerto measure points in the pictures. It’s not correct and doesn’t aim to be!
![Page 8: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/8.jpg)
Lock splitting• Incohesive classes tend to increase lock granularity ..
• .. at least make your locks cohesive by splitting them (even better: write cohesive classes to begin with!)
• Only a short-term solution to contention - in this case: as soon as you double the load, it’ll be the same
See also: Brian Goetz, “Java Concurrency in Practice”, Chapter 11
![Page 9: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/9.jpg)
Lock striping• Extends the lock splitting idea, but works on
partitions of variable-sized data
• Classic example: ConcurrentHashMap - 16 buckets with respective locks rather than one “global” lock
• Depends on the number available processors and the likelihood they’ll end up locking the same partition (e.g. non uniformly distributed data)
• To some extent also a trade-off between memory and performance (e.g. do you really need 16 buckets with ConcurrentHashMaps? they’re not that cheap!)
See also: http://ria101.wordpress.com/2011/12/12/concurrenthashmap-avoid-a-common-misuse/ and of course “Java Concurrency in Practice”, Chapter 11
![Page 10: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/10.jpg)
Layers of synchronisation
• High-level concurrency abstractions (java.util.concurrent, scala.concurrent)
• Low-level locking (synchronized() blocks and util.concurrent.locks)
• Low-level primitives (volatile variables, util.concurrent.atomic classes)
• Data races: deliberate undersynchronisation (Avoid!)
Shamelessly taken from: Jeremy Manson, “Advanced Topics in Programming Languages: The Java Memory Model” http://www.youtube.com/watch?v=1FX4zco0ziY
Let’s take a step back for a moment ..
![Page 11: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/11.jpg)
Synchronisation addresses two distinct issues
• Thread-interference or atomicity
• Visibility, ordering and memory consistency (i.e. what volatile is about)
Quantum concurrency and Schrödinger’s memory tricks:
The thread we’ll use to observe the value of the counter has an effect on the observation!
![Page 12: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/12.jpg)
Why is this code broken?• Double-checked locking and concurrent collections,
so what’s the problem then? (don’t argue about whether or not caches should preload everything up-front - fair point, but that’s not the issue here)
![Page 13: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/13.jpg)
Can you see it now?• Semantically speaking, this is exactly the same code
• The compiler, the JVM, the operating system & even the CPU conspire behind your back against you in the Extraordinary League of Ordinary Things That Will Mess You Up! Most likely, they’re sinister enough to wait until you deploy to production before they show their true faces!
See also: Most/many double-checked locking implementations around Singletons
![Page 14: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/14.jpg)
Happens-before relationships• Monitor lock rule. An unlock on a monitor lock happens before every subsequent lock on
that same monitor lock.
• Volatile variable rule. A write to a volatile field happens before every subsequent read of that same field.
• ...
• Transitivity. If A happens before B, and B happens before C, then A happens before C.
Shamelessly taken from: “Java theory and practice: Fixing the Java Memory Model, Part 2“,http://www.ibm.com/developerworks/library/j-jtp03304/
![Page 15: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/15.jpg)
Volatile piggybacking (1)
Shamelessly taken from Vitkor Klang’s Github: https://gist.github.com/viktorklang/2362563
• With high-level concurrency frameworks, you may not have to worry about these issues (note: plain, vanilla thread pools are not high level enough - very fragile technique anyway)
![Page 16: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/16.jpg)
Volatile piggybacking (2)• Repetitive exercise, I know, but why can’t we rely on
thread pools for memory consistence? They do have locks internally! (I’ll promise, you’ll understand concurrent code a lot better if you think about this!)
Hint: Think about happens-before relationships with regards to locks and multiple workers (i.e. what are the release/
acquire pairs for your memory barriers?)
![Page 17: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/17.jpg)
A word on immutability (1)• Java Memory Model treats final fields / val fields
specially (value must be assigned before the constructor returns and cannot be re-assigned)
Actors and the Java Memory Model. In most cases messages are immutable, but if that message is not a properly constructed immutable object, without a "happens before" rule, it would be possible for the receiver to see partially initialized data structures and possibly even values out of thin air (longs/doubles).
See: http://typesafe.com/blog/akka-and-the-java-memory-model
![Page 18: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/18.jpg)
A word on immutability (2)
• Its state cannot be modified after construction (i.e. no getters that return mutable objects, nothing passed to the constructor references mutable objects held by this one, etc.)
• All fields are declared as final / val *
• It is properly constructed (i.e. the this reference doesn’t escape during construction)
Thus, precisely defined notion of immutability
* Yes, java.lang.String is not immutable according to that definition. hashCodes are cached and there actually is a data race in that method, but
it’s a benign one. So for all intents and purposes, java.lang.String can still be considered an immutable class.
![Page 19: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/19.jpg)
Tasks and thread pools
• Heterogenous tasks are annoying when you aim for utilisation (bit theoretical though as it presumably averages out .. but .. )
• Dependent tasks cause even more issues (possibly even dead locks, if it’s a bounded thread pool)
Task A Task B (10x Task A)Sequential:
Parallel: Task A
Task B (10x Task A)
Result: A whopping 9% speedup! (well, we still need to deduct something for concurrency overhead ..)
![Page 20: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/20.jpg)
Configuring thread pools (1)
Worker Queue(no such queue really exists, but we’ll just think that way)
Task Queue
newFixedThreadPool bounded - n unboundedLinkedBlockingQueue
newSingleThreadExecutor bounded - 1 unboundedLinkedBlockingQueue
newCachedThreadExecutor unbounded SynchronousQueue
alternative invocation of new ThreadPoolExecutor(..)
bounded - n bounded - m, m > nLinkedBlockingQueue(m)
alternative invocation of new ThreadPoolExecutor(..)
bounded - n SynchronousQueue
The same implementation can exhibit radically different behaviour depending on how you instantiate it.
Note: SynchronousQueues are not just LinkedBlockingQueues with capacity 1.They’re more like rendezvous-channels in CSP.
![Page 21: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/21.jpg)
Configuring thread pools (2)• Client-run saturation policy means that overloading
causes tasks being pushed outward from the thread pool (no more accepts, TCP might dismiss connections, etc.. which ultimately enables clients as well to handle degradation - e.g. load balancing)
• For example, asynchronous loggers that don’t break down when sh** hits the fan!
See also: Brian Goetz, “Java Concurrency in Practice”, Chapter 8.3
![Page 22: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/22.jpg)
Visualising task queues• Predefined tasks (nudge nudge, actors) will be used to process
different data (I don’t even need to “nudge nudge” here ..)
Payload 1Task A
Payload 2Task B
Payload 3Task A
Payload 4Task C
Payload 5Task A
Payload 6Task B
Payload 7Task A
Payload 8Task C
...
Thread 1 Thread 2 Thread 3 Thread 4
Spot the issue in this model! Hint: Think “contention”, think
“BlockingQueue.take()”
What could the solution look like? Maintain the invariant that we’re only
allowed to process a message once and only once!
Hint: It’s not non-blocking locking!
![Page 23: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/23.jpg)
Organising task queues• Make a distinction between tasks and data and do
some sensible partitioning
Task A
Payload 1
Payload 3
Payload 5
Payload 7
Task B
Payload 2
Payload 6...
Thread 1 Thread 2 Thread 3 Thread 4
• Tasks now have message .. I mean .. payload queues
• n tasks with a queue each means 1/n load per queue (if you add new kinds of tasks, this scales, if you just add more messages not so, but hold on to your thought!)
• Tasks can still be executed in parallel (i.e. you don’t get away yet without synchronisation)
![Page 24: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/24.jpg)
Does it really make a difference? (1)
• Comparison is silly and totally crazy, but it’s a bit like the difference between these two pieces of code (obviously neither is recommended ..)
• Apart from reduced contention, all kinds of localities that you’re exploiting (cache friendliness, GC friendliness - new objects don’t
span over multiple threads, and so on and so forth)
In case you haven’t had enough background literature yet: http://gee.cs.oswego.edu/dl/papers/fj.pdf
![Page 25: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/25.jpg)
Does it really make a difference? (2)
• In case you’re still not believing me, here’s a proof by “Pics or it didn’t happen!”
• ForkJoin Pools organise tasks similarly, hence the comparison
Shamelessly taken from: “Scalability of ForkJoin Pool”, http://letitcrash.com/post/17607272336/scalability-of-fork-join-pool
![Page 26: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/26.jpg)
One more thing!• The missing commandment. Thou shall not schedule
two tasks at the same time, if they both need the same locks!
• How would the scheduler know? Well, here’s an educated guess: If two tasks are the same task, they will most likely also need the same locks!
• Executing an actor only once at a time therefore has performance reasons (yes, it does make it easier as well to reason about it .. but we wouldn’t want to appear lame ..)
• Conversely, if you write different actors, make sure that they don’t use the same locks (not sure if this is a best-practice in Akka, but it’s certainly true in Erlang)
![Page 27: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/27.jpg)
Finally!
• The devil’s in the detail and unfortunately some knowledge of these details is required to design scalable architectures.
• In particular, understanding the underlying issues will hopefully help you with designing scalable Akka applications (e.g. applying what you’ve heard, what can you do about too many messages being queued up?)
• Concurrency is hard, yes, but isn’t that the beauty about it? Not at all, but never mind!
![Page 28: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/28.jpg)
To sum up, just read this book!
![Page 29: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/29.jpg)
Q&A
![Page 30: Concurrency on the JVM](https://reader033.vdocuments.site/reader033/viewer/2022042613/54b7a93a4a7959b0218b4623/html5/thumbnails/30.jpg)
Thanks!
Bernhard Huemerbhuemer.at@bhuemer23 July 2013
IRIAN Solutionsirian.at