un monde où 1 ms vaut 100 m€ - devoxx france 2015
TRANSCRIPT
-
@Alex_Victoor @ThierryAbalea#sginsideit
Un monde o 1ms vaut 100M deuros
-
@YourTwitterHandle#DVXFR14{session hashtag} @Alex_Victoor @ThierryAbalea#sginsideit
Speakers
Alexandre Victoor@Alex_Victoor
Thierry Abala@ThierryAbalea
-
5Kstatus updates / Sec
6KTweets/ Sec
1,6MMails / SEC
40KSearches / SEC
740KMessages / SEC
Big Data != Web
-
5Kstatus updates / Sec
6KTweets/ Sec
1,6MMails / SEC
40KSearches / SEC
740KMessages / SEC
1.1MUS OPTIONS
Trades & quotes / SEC
Big Data != Web
-
Plus vite que la lumire
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Latence tous les niveaux
APPLICATIF
JVMOS
RESEAU
DISQUE
CPUMEMOIRE
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Un Quiz pour schauffer
-
int SIZE = 1000000;int NB_ARRAY = 50;long[][] longs = new long[NB_ARRAY][SIZE];
long result = 0;
for (int j=0; j
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Memory Layout
Le premier programme est le plus rapide !
47ms
796ms
mesurer
-
Cache L3
Units dexcution
Cache L1
Cache L2
Coeur 1
Registres
Cache L1
Cache L2
Coeur 2
Processeur
< 1 ns
~ 1 ns
~ 3 ns
~ 12 ns
Registres
Units dexcution
-
int SIZE = 1000000;int NB_ARRAY = 50;long[][] longs = new long[NB_ARRAY][SIZE];
long result = 0;
for (int i=0; i
-
int SIZE = 1000000;int NB_ARRAY = 50;long[][] longs = new long[NB_ARRAY][SIZE];
long result = 0;
for (int j = 0; j
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Mesurer micro benchmarks
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
OpenJDK JMH$ mvn archetype:generate
-DinteractiveMode=false -DarchetypeGroupId=org.openjdk.jmh-DarchetypeArtifactId=jmh-java-benchmark-archetype-DgroupId=org.sample-DartifactId=devoxx-bench-Dversion=1.0
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Mise en situation
-
La vente de produit financiers (avant)
-
RFQ: Request For Quote
-
API - en direct
CLIEN
TBANQUE
WEB Marchs
SALES +
Trading
Risk (contrles)Booking
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Java ?
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Simple pour dmarrer ?
-
Random rand = new Random();IntStream stream = rand.ints(10 * 1024 * 1024, 0, 2);
-
Random rand = new Random();IntStream stream = rand.ints(10 * 1024 * 1024, 0, 2);
int sum = stream.sum();
-
Random rand = new Random();IntStream stream = rand.ints(10 * 1024 * 1024, 0, 2);
int sum = stream.sum();
int sum = stream.parallel().sum();
-
Random rand = new Random();IntStream stream = rand.ints(10 * 1024 * 1024, 0, 2);
int sum = stream.sum();
int sum = stream.parallel().sum();
113 ms
-
Random rand = new Random();IntStream stream = rand.ints(10 * 1024 * 1024, 0, 2);
int sum = stream.sum();
int sum = stream.parallel().sum(); 1167 ms
157 ms
-
Premire archi
Event loop
-
Premire archi
Event loopI/O
-
Premire archi
Event loop I/OI/O
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Queue
-
Lock
-
Nos heures de pointeOpen & Close
-
Cot du lockAppels systme
-
Cot du lockChangements de contexte
-
Cache L3
Units dexcution
Cache L1
Cache L2
Coeur 1
Registres
Cache L1
Cache L2
Coeur 2
Processeur
< 1 ns
~ 1 ns
~ 3 ns
~ 12 ns
Registres
Units dexcution
-
Algorithmes non bloquantsAu moins un thread progresse
-
Algorithmes non bloquantsPas de section critique, locks, mutexes, spin-locks,
-
j.u.c.ConcurrentLinkedQueue
-
P1
P2
P3
C
Queue MPSC
-
P1
P2
P3
C
Queues SPSC
-
P1
P2
P3
C
Queues SPSC
Single
Writer
Princip
le
-
Deuxime archi
I/OI/O
-
producerIndex = 42
Concurrent Reading and Writing, Leslie Lamport, 1977
E E null null null null null null null E
offset = 2
consumerIndex = 39 offset = 9
Lamport Queue
-
producerIndex = 42
offset = 2
consumerIndex = 39
offset = 9
Lamport Queue
nullnull
E
E
E
nullnull
null
null
null
Concurrent Reading and Writing, Leslie Lamport, 1977
-
import java.util.AbstractQueue;
public final class LamportQueue1 extends AbstractQueue {private final E[] buffer;private volatile long producerIndex = 0;private volatile long consumerIndex = 0;
public LamportQueue1(int capacity) {buffer = (E[]) new Object[capacity];
}
@Overridepublic int size() {
return (int) (producerIndex - consumerIndex);}
-
@Overridepublic boolean offer(final E e) {
if (size() == buffer.length) {return false;
}
final int offset = (int)(producerIndex % buffer.length);buffer[offset] = e;producerIndex++;return true;
}
-
@Overridepublic boolean offer(final E e) {
if (size() == buffer.length) {return false;
}
final int offset = (int)(producerIndex % buffer.length);buffer[offset] = e;producerIndex++;return true;
}
-
@Overridepublic boolean offer(final E e) {
if (size() == buffer.length) {return false;
}
final int offset = (int)(producerIndex % buffer.length);buffer[offset] = e;producerIndex++;return true;
}
-
@Overridepublic boolean offer(final E e) {
if (size() == buffer.length) {return false;
}
final int offset = (int)(producerIndex % buffer.length);buffer[offset] = e;producerIndex++;return true;
}
-
@Overridepublic E poll() {
if (consumerIndex == producerIndex) {return null;
}
final int offset = (int)(consumerIndex % buffer.length);final E e = buffer[offset];buffer[offset] = null;consumerIndex++;return e;
}
-
@Overridepublic E poll() {
if (consumerIndex == producerIndex) {return null;
}
final int offset = (int)(consumerIndex % buffer.length);final E e = buffer[offset];buffer[offset] = null;consumerIndex++;return e;
}
-
@Overridepublic E poll() {
if (consumerIndex == producerIndex) {return null;
}
final int offset = (int)(consumerIndex % buffer.length);final E e = buffer[offset];buffer[offset] = null;consumerIndex++;return e;
}
-
@Overridepublic E poll() {
if (consumerIndex == producerIndex) {return null;
}
final int offset = (int)(consumerIndex % buffer.length);final E e = buffer[offset];buffer[offset] = null;consumerIndex++;return e;
}
-
@Overridepublic E poll() {
if (consumerIndex == producerIndex) {return null;
}
final int offset = (int)(consumerIndex % buffer.length);final E e = buffer[offset];buffer[offset] = null;consumerIndex++;return e;
}
-
@Overridepublic E poll() {
if (consumerIndex == producerIndex) {return null;
}
final int offset = (int)(consumerIndex % buffer.length);final E e = buffer[offset];buffer[offset] = null;consumerIndex++;return e;
}
-
,4
,16
,0
,5
,10
,15
,20
BQ LQ1
16
Performance (MOps/s) x 4
BlockinqQueue Lamport v1
20
15
10
5
0
4
-
Correct ?
-
private final E[] buffer;private volatile long producerIndex = 0;private volatile long consumerIndex = 0;
-
@Overridepublic boolean offer(final E e) {
if (size() == buffer.length) {// queue is fullreturn false;
}
final int offset =(int)(producerIndex
% buffer.length);buffer[offset] = e;producerIndex++;return true;
}
@Overridepublic E poll() {
if (consumerIndex == producerIndex) { // queue is emptyreturn null;
}
final int offset = (int)(consumerIndex% buffer.length);
final E e = buffer[offset];buffer[offset] = null;consumerIndex++;return e;
}
Producer Thread Consumer Thread
-
// consumerIndex = 2// producerIndex = 2// e = 777
Producer Thread Consumer Thread
-
// consumerIndex = 2// producerIndex = 2// e = 777final int offset =
(int)(producerIndex% buffer.length); // 2
Producer Thread Consumer Thread
-
// consumerIndex = 2// producerIndex = 2// e = 777final int offset =
(int)(producerIndex% buffer.length); // 2
buffer[offset] = e; // buff[2] = 777
Producer Thread Consumer Thread
-
// consumerIndex = 2// producerIndex = 2// e = 777final int offset =
(int)(producerIndex% buffer.length); // 2
buffer[offset] = e; // buff[2] = 777producerIndex++; // 3
Producer Thread Consumer Thread
-
if (consumerIndex == producerIndex) { // 2 != 3// queue is empty return null;
}
Producer Thread Consumer Thread// consumerIndex = 2// producerIndex = 2// e = 777final int offset =
(int)(producerIndex% buffer.length); // 2
buffer[offset] = e; // buff[2] = 777producerIndex++; // 3
-
if (consumerIndex == producerIndex) { // 2 != 3// queue is empty return null;
}
final int offset = (int)(consumerIndex% buffer.length); // 2
Producer Thread Consumer Thread// consumerIndex = 2// producerIndex = 2// e = 777final int offset =
(int)(producerIndex% buffer.length); // 2
buffer[offset] = e; // buff[2] = 777producerIndex++; // 3
-
if (consumerIndex == producerIndex) { // 2 != 3// queue is empty return null;
}
final int offset = (int)(consumerIndex% buffer.length); // 2
final E e = buffer[offset]; // null (buffer[2])return e; // return null
Producer Thread Consumer Thread// consumerIndex = 2// producerIndex = 2// e = 777final int offset =
(int)(producerIndex% buffer.length); // 2
buffer[offset] = e; // buff[2] = 777producerIndex++; // 3
-
if (consumerIndex == producerIndex) { // 2 != 3// queue is empty return null;
}
final int offset = (int)(consumerIndex% buffer.length); // 2
final E e = buffer[offset]; // null (buffer[2])return e; // return null
Producer Thread Consumer Thread
Java 1.4
// consumerIndex = 2// producerIndex = 2// e = 777final int offset =
(int)(producerIndex% buffer.length); // 2
buffer[offset] = e; // buff[2] = 777producerIndex++; // 3
-
if (consumerIndex == producerIndex) { // 2 != 3// queue is empty return null;
}
final int offset = (int)(consumerIndex% buffer.length); // 2
final E e = buffer[offset]; // 777 (buffer[2])return e; // return 777
Producer Thread Consumer Thread
Java 1.5 & +
// consumerIndex = 2// producerIndex = 2// e = 777final int offset =
(int)(producerIndex% buffer.length); // 2
buffer[offset] = e; // buff[2] = 777producerIndex++; // 3
Happens-Before
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Encore un Quiz !
-
Lequel est le plus rapide ?
public static final int SIZE = 256 * 1024;private int[] data = new int[SIZE];
@Setuppublic void init() {
Random rand = new Random();for (int i = 0; i < SIZE; i++) {
data[i] = rand.nextInt(100) - 50;}
}
-
@Benchmarkpublic int mathAbs() {
int sum = 0;
for (int x : data) {
sum += Math.abs(x);
}return sum;
}
@Benchmarkpublic int customAbs() {
int sum = 0;for (int x : data) {
if (x < 0) {sum -= x;
} else {sum += x;
}}return sum;
}1 2
Lequel est le plus rapide ?
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Instruction Pipeline
Le premier programme est le plus rapide !
273us
1180us
mesurer
-
Fetch Decode Write-backExecute
Instruction
Instruction
Instruction
Instruction
Waiting
Program Order Cycle dhorloge 1
-
Fetch Decode Write-backExecute
Instruction
Instruction
Instruction
Instruction
Waiting
Program Order Cycle dhorloge 2
-
Fetch Decode Write-backExecute
Instruction
Instruction
Instruction
Instruction
Waiting
Program Order Cycle dhorloge 3
-
Fetch Decode Write-backExecute
Instruction
Instruction
Instruction
Instruction
Waiting
Program Order Cycle dhorloge 4
-
Fetch Decode Write-backExecute
Instruction
Instruction
Instruction
Instruction
Waiting
Cycle dhorloge 5Program Order
-
Fetch Decode Write-backExecute
Instruction
Instruction
Instruction
Waiting
Cycle dhorloge 6Program Order
-
Fetch Decode Write-backExecute
jump if sign
Waiting
Program Order Cycle dhorloge 1
-
Fetch Decode Write-backExecute
jump if sign
Waiting
Program Order Cycle dhorloge 2
-
Fetch Decode Write-backExecute
jump if sign
Waiting
Program Order Cycle dhorloge 3
-
Fetch Decode Write-backExecute
jump if sign
Waiting
Program Order Cycle dhorloge 4
-
Fetch Decode Write-backExecute
jump if sign
Waiting
Program Order Cycle dhorloge 4
Poubelle
Bad Prediction
-
@YourTwitterHandle#DVXFR14{session hashtag} @Alex_Victoor @ThierryAbalea#sginsideit
Mesurer la latence
-
moyenne4
Latence
Temps
Centile (percentile)
-
Latence
Temps
Centile (percentile)
-
Latence
Temps
Centile (percentile)
-
3.550%
Latence
Temps
Centile (percentile)
-
3.5
6
50%
90%
Latence
Temps
Centile (percentile)
-
HdrHistogramHistogram histo = new Histogram(5);histo.recordValue(end-start);histo.getValueAtPercentile(0.99);histo.outputPercentileDistribution(os, 1D);
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Queue v2
-
Cot du volatile
-
Units dexcution
Cache L1
Cache L2
Coeur
Registres
Store Buffer
-
Units dexcution
Cache L1
Cache L2
Coeur
Registres
Store Buffer
Volatile store
-
Units dexcution
Cache L1
Cache L2
Coeur
Registres
S5
S4
S3
S2
S1
Store Buffer
Volatile store
-
import java.util.AbstractQueue;
public final class LamportQueue2 extends AbstractQueue {private final E[] buffer;private final AtomicLong producerIndex = new AtomicLong();private final AtomicLong consumerIndex = new AtomicLong();
public LamportQueue2(int capacity) {buffer = (E[]) new Object[capacity];
}
@Overridepublic int size() {
return (int) (producerIndex.get() - consumerIndex.get());}
-
@Overridepublic boolean offer(final E e) {
if (size() == buffer.length) {return false;
}
final int offset = (int)(producerIndex % buffer.length);buffer[offset] = e;producerIndex.lazySet(producerIndex.get() + 1);return true;
}
-
@Overridepublic E poll() {
if (consumerIndex == producerIndex) {return null;
}
final int offset = (int)(consumerIndex.get() % buffer.length);final E e = buffer[offset];buffer[offset] = null;consumerIndex.lazySet(consumerIndex.get() + 1);return e;
}
-
Performance (MOps/s)
,4
,16
,43
,0
,10
,20
,30
,40
,50
BQ LQ1 LQ2
X 10
BlockinqQueue Lamport v1 Lamport v2
50
40
30
20
10
0
4
16
43
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
LES LOGS
-
Les logs, plusieurs approches Appender fichier standard, critures bloquantes
-
Les logs, plusieurs approches Appender fichier standard, critures bloquantes Appender bufferis
-
Les logs, plusieurs approches Appender fichier standard, critures bloquantes Appender bufferis Appender asynchrone
-
Les logs, plusieurs approches Appender fichier standard, critures bloquantes Appender bufferis Appender asynchrone
Appender memory map file
-
Memory map file
Mem virtuelle
-
Memory map file
Mem virtuelle Mem physique
-
Memory map file
Mem virtuelle DisqueMem physique
-
Logs & mmapMEMOIRE (OFF HEAP)
BUFFER
FICHIER
Donnes vide
-
Logs & mmapMEMOIRE (OFF HEAP)
BUFFER
FICHIER
Donnes vide
Appel
syst
me
-
perf-test-txt-chronicle
Chronicle Logback Appender
-
Texte vs Binaire donnes de march
Code ISIN sur 12 caractres
Un prix dachat (bid)
Un prix de vente (ask)
Isin=FR0000120271 Bid=46.575 Ask=46.590
-
Texte vs Binaire
Isin=FR0000120271 Bid=46.575 Ask=46.590
Texte (UTF-8)
49 73 69 6E 3D 46 52 30 30 30 30 31 32 30 32 37 31 20 42 69 64 3D 34 36 2E 35 37 35 20 41 73 6B 3D 34 36 2E 35 39 30
39 octets
Binaire
46 52 30 30 30 30 31 32 30 32 37 31 40 47 49 99 99 99 99 9A 40 47 4B 85 1E B8 51 EC
28 octets (20 avec des int)
-
Serialization vs toString()
public class Quote {String code;double bid;double ask;
}
return "Quote {" +"code='" + code + '\'' +", bid=" + bid +", ask=" + ask +
'}';
-
Serialization vs toString()
public class Quote {String code;double bid;double ask;
}
ByteBuffer buffer = ByteBuffer.allocate(28);
buffer.put(code.getBytes());buffer.putDouble(bid);buffer.putDouble(ask);buffer.flip();out.write(buffer.array());
-
perf-test-txt-chronicle
perf-test-bin-chronicle
Logger en binaire avec Chronicle
-
logger.info("New quote received - {}", quote);
-
Bench logger Log de 100 000 messages de cotation
Mesure de la latence induite par lappel au logger
System Under TestCPU : 2x (Xeon E5-2630, 2.3GHz 2.8 GHz, 6 cores)RAM : 64 GB DDR3 PC10600HDD : 2x 300GB en RAID1, 10K tr/min, SAS 6 Gb/s, 1 GB cacheNetwork 1GbEOS: RHEL 6.4 avec un kernel 2.6.32
-
FileAppender vs BinaryIndexedChronicleAppender
1us
5us
2us
10us
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Conclusion
-
Dernier Quiz : ce quil faut retenir
Cest toujours le premier programme le plus rapide
Il faut se mfier de son intuition, en matire de perf, il faut mesurer !!!
Soyez curieux, le hardware ce nest pas sale
-
DCOUVREZ TOUTES NOS OFFRES SUR
CAREERS . SOC I E TEGENERALE . COM
-
@YourTwitterHandle@YourTwitterHandle@Alex_Victoor @ThierryAbalea#sginsideit
Questions ?
-
@Alex_Victoor @ThierryAbalea#sginsideit
Rfrences
Talk Lock-free Algorithms for Ultimate Performance, Martin Thompsonhttps://yow.eventer.com/yow-2012-1012/lock-free-algorithms-for-ultimate-performance-by-martin-thompson-1250
Talk Queue evolution: from 10M to 470M ops/sec, Nitsan Wakarthttps://vimeo.com/100197431 / https://github.com/nitsanw/QueueEvolution
Talk How NOT to Measure Latency, Gil Tenehttp://www.infoq.com/presentations/latency-pitfalls
JMH, THE Micro Benchmark Tool for Java http://openjdk.java.net/projects/code-tools/jmh/
HdrHistogram, THE Latency Measurement & Plotting Toolhttps://github.com/HdrHistogram/HdrHistogram
Blog posts related to Mechanical Sympathy, Martin Thompsonhttp://mechanical-sympathy.blogspot.com/
Blog posts related to Java Performance, Nitsan Wakart http://psy-lob-saw.blogspot.com/
Blog posts related to Java Performance, Peter Lawrey http://vanillajava.blogspot.com/
Chronicle, OpenHFTs Tools http://openhft.net/
The Mechanical Sympathy Forum https://groups.google.com/forum/#!forum/mechanical-sympathy
Code de cette prsentation https://github.com/ThierryAbalea/high-performance-2015-talk
Slides de cette prsentation http://www.slideshare.net/ThierryAbalea/un-monde-o-1-ms-vaut-100-m-devoxx-france-2015
slideshare