java collections the force awakens - jax london...reducing scope for bugs ~280 bugs in 28 projects...

68
Java Collections The Force Awakens Darth @RaoulUK Darth @RichardWarburto

Upload: others

Post on 27-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Java CollectionsThe Force Awakens

Darth @RaoulUKDarth @RichardWarburto

Page 2: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent
Page 3: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Collection Problems

Java Episode 8 & 9

Persistent & Immutable Collections

HashMaps

Page 4: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Collection bugs

1. Element access (Off-by-one error, ArrayOutOfBound)2. Concurrent modification 3. Check-then-Act

Page 5: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Scenario 1

List<String> jedis = new ArrayList<>(asList("Luke", "yoda"));

for (String jedi: jedis) {

if (Character.isLowerCase(jedi.charAt(0))) {

jedis.remove(jedi);

}

}

Page 6: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Scenario 2

Map<String, BigDecimal> movieViews = new HashMap<>();

BigDecimal views = movieViews.get(MOVIE);

if(views != null) {

movieViews.put(MOVIE, views.add(BigDecimal.ONE));

}

views != nullmoviesViews.get movieViews.putThen

Check Act

Page 7: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Reducing scope for bugs

● ~280 bugs in 28 projects including Cassandra, Lucene

● ~80% check-then-act bugs discovered are put-if-absent

● Library designers can help by updating APIs as new idioms emerge

● Different data structures can provide alternatives by restricting reads & updates to reduce scope for bugs

CHECK-THEN-ACT Misuse of Java Concurrent Collectionshttp://dig.cs.illinois.edu/papers/checkThenAct.pdf

Page 8: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Collection Problems

Java Episode 8 & 9

Persistent & Immutable Collections

HashMaps

Page 9: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Java 8 Lazy Collection Initialization

Many allocated HashMaps and ArrayLists never written to, eg Null object pattern

Java 8 adds Lazy Initialization for the default initialization case

Typically 1-2% reduction in memory consumption

http://www.javamagazine.mozaicreader.com/MarApr2016/Twitter#&pageSet=28&page=0

Page 10: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent
Page 11: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Java 9 API updates

Collection factory methods● Non-goal to provide persistent immutable collections● http://openjdk.java.net/jeps/269

java.util.Optional● ifPresentOrElse(), or(), stream(), getWhenPresent()● Optional.get() will be deprecated in future

java.util.Stream & java.util.stream.Collectors● takeWhile, dropWhile● filtering, flatMapping

java.util.concurrent.CompletableFuture● orTimeout, completeOnTimeout

Page 12: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Collection Problems

Java Episode 8 & 9

Persistent & Immutable Collections

HashMaps

Page 13: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Categorising Collections

Mutable

Immutable

Non-Persistent Persistent

Unsynchronized Concurrent

Unmodifiable View

Available in Core Library

Page 14: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Mutable

● Popular friends include ArrayList, HashMap, TreeSet

● Memory-efficient modification operations

● State can be accidentally modified

● Can be thread-safe, but requires careful design

Page 15: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Unmodifiable

List<String> jedis = new ArrayList<>();

jedis.add("Luke Skywalker");

List<String> cantChangeMe = Collections.unmodifiableList(jedis);

// java.lang.UnsupportedOperationException

//cantChangeMe.add("Darth Vader");

System.out.println(cantChangeMe); // [Luke Skywalker]

jedis.add("Darth Vader");

System.out.println(cantChangeMe); // [Luke Skywalker, Darth Vader]

Page 16: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent
Page 17: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Immutable & Non-persistent

● No updates

● Flexibility to convert source in a more efficient representation

● No locking in context of concurrency

● Satisfies co-variant subtyping requirements

● Can be copied with modifications to create a new version (can be

expensive)

Page 18: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Immutable vs. Mutable hierarchy

ImmutableList MutableList

+ ImmutableList<T> toImmutable()

java.util.List

+ MutableList<T> toList()

Eclipse Collections (formaly GSCollections) https://projects.eclipse.org/projects/technology.collections/

ListIterable

Page 19: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Immutable and Persistent

● Changing source produces a new (version) of the collection

● Resulting collections shares structure with source to avoid full copying on updates

Page 20: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Persistent List (aka Cons)

public final class Cons<T> implements ConsList<T> {

private final T head;

private final ConsList<T> tail;

public Cons(T head, ConsList<T> tail) {

this.head = head; this.tail = tail;

}

@Override

public ConsList<T> add(T e) {

return new Cons(e, this);

}

}

Page 21: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Updating Persistent List

A B C X Y Z

Before

Page 22: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Updating Persistent List

A B C X Y Z

Before

A B D

After

Blue nodes indicate new copiesPurple nodes indicates nodes we wish to update

Page 23: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Concatenating Two Persistent Lists

A B C

X Y Z

Before

Page 24: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Concatenating Two Persistent Lists

- Poor locality due to pointer chasing- Copying of nodes

A B C

X Y Z

Before

A B C

After

Page 25: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Persistent List

● Structural sharing: no need to copy full structure

● Poor locality due to pointer chasing

● Copying becomes more expensive with larger lists

● Poor Random Access and thus Data Decomposition

Page 26: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Updating Persistent Binary Tree

Before

Page 27: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Updating Persistent Binary Tree

After

Page 28: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Persistent Array

How do we get the immutability benefits with performance of mutable variants?

Page 29: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Trieroot

10 4520

3. Picking the right branch is done by using parts of the key as a lookup

1. Branch factor not limited to binary

2. Leaf nodes contain actual values

a

a e

bc

b c f

Page 30: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Persistent Array (Bitmapped Vector Trie)... ...

... ...

... ...

... ...

.

.

.

.

.

.

1 31

0 1 31

Level 1 (root)

Level 2

Leaf nodes

Page 31: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Trade-offs

● Large branching factor facilitates iteration but hinders updates

● Small branching factor facilitates updates but hinders traversal

Page 32: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Java Persistent Collections

- Not available as part of Java Core Library

- Existing projects includes- PCollections: https://github.com/hrldcpr/pcollections- Port of Clojure DS: https://github.com/krukow/clj-ds- Port of Scala DS: https://github.com/andrewoma/dexx- Coming soon to Javaslang

Page 33: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Memory usage survey

10,000,000 elements, heap < 32GB

int[] : 40MBInteger[]: 160MBArrayList<Integer>: 215MBPersistentVector<Integer>: 214MB (Clojure-DS)Vector<Integer>: 206MB (Dexx, port of Scala-DS)

Data collected using Java Object Layout: http://openjdk.java.net/projects/code-tools/jol/

Page 34: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Primitive specialised collections

● Collections often hold boxed representations of primitive values

● Java 8 introduced IntStream, LongStream, DoubleStream and

primitive specialised functional interfaces

● Other libraries, eg: Agrona, Koloboke and Eclipse-Collections provide

primitive specialised collections today.

● Valhalla investigates primitive specialised generics

Page 35: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Takeaways

● Immutable collections reduce the scope for bugs

● Always a compromise between programming safety and performance

● Performance of persistent data structure is improving

Page 36: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Collection Problems

Java Episode 8 & 9

Persistent & Immutable Collections

HashMaps

Page 37: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent
Page 38: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

HashMaps Basics

...

Han Solohash = 72309

Chewbaccahash = 72309

Page 39: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Chaining Probing

HashMaps

a separate data structure for collision lookups

Store inline and have a probing sequence

Page 40: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Aliases: Palpatine vs Darth Sidious

Page 41: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Chaining Probing

HashMaps

aka Closed Addressing

aka Open Hashing

aka Open Addressing

aka Closed Hashing

Page 42: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Chaining Probing

HashMaps

Linked List Based Tree Based

Page 43: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

java.util.HashMap

Chaining Based HashMap

Historically maintained a LinkedList in the case of a collision

Problem: with high collision rates that the HashMap approaches O(N) lookup

Page 44: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

java.util.HashMap in Java 8

Starts by using a List to store colliding values.

Trees used when there are over 8 elements

Tree based nodes use about twice the memory

Make heavy collision lookup case O(log(N)) rather than O(N)

Relies on keys being Comparable

https://github.com/RichardWarburton/map-visualiser

Page 45: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

So which HashMap is best?

Page 46: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Benchmarking is about building a mental model of the performance tradeoffs

Page 47: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Example Jar-Jar Benchmark

call get() on a single value for a map of size 1

No model of the different factors that affect things!

Page 48: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Benchmarking HashMaps

Load FactorNonlinear key accessSuccessful vs Failed get()Hash CollisionsComparable vs Incomparable keysDifferent Keys and ValuesCost of hashCode/Equals

Page 49: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Tree Optimization - 60% Collisions

Page 50: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Tree Optimization - 10% Collisions

Page 51: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Probing vs Chaining

Probing Maps usually have lower memory consumption

Small Maps: Probing never has long clusters, can be up to 91% faster.

In large maps with high collision rates, probing scales poorly and can be significantly slower.

Page 52: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Takeaways

There’s no clearcut “winner”.

JDK Implementations try to minimise worst case.

Linear Probing requires a good hashCode() distribution, Often hashmaps “precondition” their hashes.

IdentityHashMap has low memory consumption and is fast, use it!

3rd Party libraries offer probing HashMaps, eg Koloboke & Eclipse-Collections.

Page 53: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Conclusions

Page 54: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Interface Popularity

List 1576210

Set 980763

Map 803171

Queue 62024

Deque 3464

SortedSet 9121

NavigableSet 1735

SortedMap 8677

NavigableMap 1484

Page 55: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Implementation Popularity

ArrayList 225029

LinkedList 26850

ArrayDeque 1086

HashSet 68940

TreeSet 10108

EnumSet 10512

HashMap 137610

TreeMap 7734

WeakHashMap 3473

IdentityHashMap 2443

EnumMap 1904

Page 56: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Evolution can be interesting ...Java 1.2 Java 10?

Page 57: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent
Page 58: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Any Questions?

www.iteratrlearning.com

● Modern Development with Java 8● Reactive and Asynchronous Java● Java Software Development Bootcamp

Page 59: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Further reading

Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrayshttps://infoscience.epfl.ch/record/64410/files/techlists.pdf

Smaller Footprint for Java Collectionshttp://www.lirmm.fr/~ducour/Doc-objets/ECOOP2012/ECOOP/ecoop/356.pdf

Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collectionshttp://michael.steindorfer.name/publications/oopsla15.pdf

RRB-Trees: Efficient Immutable Vectorshttps://infoscience.epfl.ch/record/169879/files/RMTrees.pdf

Page 60: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Further reading

Doug Lea’s Analysis of the HashMap implementation tradeoffshttp://www.mail-archive.com/[email protected]/msg02147.html

Java Specialists HashMap article

http://www.javaspecialists.eu/archive/Issue235.html

Sample and Benchmark Codehttps://github.com/RichardWarburton/Java-Collections-The-Force-Awakens

Page 61: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Further reading

Debian code search used for popularityhttps://codesearch.debian.net/

Page 62: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Small HashMaps

Many HashMaps are small or empty

Lazy Initialization In Java 8+

Specialised Implementations● Collections.singleton*/Collections.empty*● Collectors.partitioningBy()● Specialised Eclipse Collections (eg Doubleton)

Page 63: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Probing Sequence

Linear- Cache Locality

Quadratic- Tree

Clever ideas

Page 64: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Implementing Persistent Collections

Fat node● Nodes store updated values in an internal list ● Different versions accessible using an order (e.g. timestamp)

Path copying● Copy path leading to updated node● Share rest with previous version

Page 65: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Benchmarking HashMaps

Test different Assumptions + Behaviours

Understand costs, don’t just measure them

Be Scientific

Use a framework

Peer Review - Wisedom of crowds

Page 66: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

h = key.hashCode() ^ (h >>> 16);

Preconditioning

Page 67: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

CopyOnWrite

public boolean add(E e) {

final ReentrantLock lock = this.lock;

lock.lock();

try {

Object[] elements = getArray();

int len = elements.length;

Object[] newElements = Arrays.copyOf(elements, len + 1);

newElements[len] = e;

setArray(newElements);

return true;

} finally {

lock.unlock();

}

}

Page 68: Java Collections The Force Awakens - JAX London...Reducing scope for bugs ~280 bugs in 28 projects including Cassandra, Lucene ~80% check-then-act bugs discovered are put-if-absent

Persistent Array (Bitmapped Vector Trie)

● Uses bit pattern (representing index number) for efficient arithmetic / lookup of elements

● Branching factor of 32 and depth of 5 can stores 33 millions elements and requires 5 lookups to find an element “practically constant”