benchmarking (ricon 2014)

Benchmarking: You’re Doing It Wrong

Aysylu Greenberg @aysylu22

To Write Good Benchmarks…

Need to be Full Stack

your process vs Goal your process vs Best PracCces

Benchmark = How Fast?

Today

•  How Not to Write Benchmarks •  Benchmark Setup & Results: -  You’re wrong about machines -  You’re wrong about stats -  You’re wrong about what maLers

•  Becoming Less Wrong •  Having Fun with Riak

HOW NOT TO WRITE BENCHMARKS

Website Serving Images

•  Access 1 image 1000 Cmes •  Latency measured for each access •  Start measuring immediately •  3 runs •  Find mean •  Dev environment

Web Request

Server

S3 Cache

WHAT’S WRONG WITH THIS BENCHMARK?

YOU’RE WRONG ABOUT THE MACHINE

Wrong About the Machine

•  Cache, cache, cache, cache!

It’s Caches All The Way Down

Web Request

Server

S3 Cache

It’s Caches All The Way Down

Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009



Web Request

Server

S3 Cache


•  Cache, cache, cache, cache! •  Warmup & Cming



Web Request

Server

S3 Cache


•  Cache, cache, cache, cache! •  Warmup & Cming •  Periodic interference



Web Request

Server

S3 Cache


•  Cache, cache, cache, cache! •  Warmup & Cming •  Periodic interference •  Test != Prod



Web Request

Server

S3 Cache


•  Cache, cache, cache, cache! •  Warmup & Cming •  Periodic interference •  Test != Prod •  Power mode changes

YOU’RE WRONG ABOUT THE STATS

Wrong About Stats

•  Too few samples

Wrong About Stats

0

20

40

60

80

100

120

0 10 20 30 40 50 60

Latency

Time

Convergence of Median on Samples

Stable Samples

Stable Median

Decaying Samples

Decaying Median


•  Access 1 image 1000 Cmes •  Latency measured for each access •  Start measuring immediately •  3 runs •  Find mean •  Dev machine

Web Request

Server

S3 Cache

Wrong About Stats

•  Too few samples •  Gaussian (not)


•  Access 1 image 1000 Cmes •  Latency measured for each access •  Start measuring immediately •  3 runs •  Find mean •  Dev machine

Web Request

Server

S3 Cache

Wrong About Stats

•  Too few samples •  Gaussian (not) •  MulCmodal distribuCon

MulCmodal DistribuCon

50% 99%

# occurren

ces

Latency 5 ms 10 ms

Wrong About Stats

•  Too few samples •  Gaussian (not) •  MulCmodal distribuCon •  Outliers

YOU’RE WRONG ABOUT WHAT MATTERS

Wrong About What MaLers

•  Premature opCmizaCon

“Programmers waste enormous amounts of Cme thinking about … the speed of noncriCcal parts of their programs ... Forget about small efficiencies …97% of the Cme: premature opHmizaHon is the root of all evil. Yet we should not pass up our opportuniCes in that criCcal 3%.”

-‐-‐ Donald Knuth


•  Premature opCmizaCon •  UnrepresentaCve workloads


•  Premature opCmizaCon •  UnrepresentaCve workloads •  Memory pressure


•  Premature opCmizaCon •  UnrepresentaCve workloads •  Memory pressure •  Load balancing


•  Premature opCmizaCon •  UnrepresentaCve workloads •  Memory pressure •  Load balancing •  Reproducibility of measurements

BECOMING LESS WRONG

User AcCons MaLer

X > Y for workload Z with trade offs A, B, and C

-‐ hLp://www.toomuchcode.org/

Profiling Code instrumentaCon Aggregate over logs Traces

Microbenchmarking: Blessing & Curse

+ Quick & cheap + Answers narrow ?s well - Osen misleading results - Not representaCve of the program


•  Choose your N wisely

Choose Your N Wisely Prof. Saman Amarasinghe, MIT 2009


•  Choose your N wisely •  Measure side effects


•  Choose your N wisely •  Measure side effects •  Beware of clock resoluCon


•  Choose your N wisely •  Measure side effects •  Beware of clock resoluCon •  Dead Code EliminaCon


•  Choose your N wisely •  Measure side effects •  Beware of clock resoluCon •  Dead Code EliminaCon •  Constant work per iteraCon

Non-‐Constant Work Per IteraCon

Follow-‐up Material

•  How NOT to Measure Latency by Gil Tene –  hLp://www.infoq.com/presentaCons/latency-‐piualls

•  Taming the Long Latency Tail on highscalability.com –  hLp://highscalability.com/blog/2012/3/12/google-‐taming-‐the-‐long-‐latency-‐tail-‐when-‐more-‐machines-‐equal.html

•  Performance Analysis Methodology by Brendan Gregg –  hLp://www.brendangregg.com/methodology.html

•  Silverman’s Mode Detec@on Method by MaL Adereth –  hLp://adereth.github.io/blog/2014/10/12/silvermans-‐mode-‐detecCon-‐method-‐explained/

HAVING FUN WITH

Setup

•  SSD 30 GB •  M3 large •  Riak version 1.4.2-‐0-‐g61ac9d8 •  Ubuntu 12.04.5 LTS •  4 byte keys, 10 KB values

1850

1900

1950

2000

2050

2100

2150

2200

2250

2300

2350

Latency (usec)

Number of Keys

Get Latency

L3

Takeaway #1: Cache

Takeaway #2: Outliers

Takeaway #3: Workload

Benchmarking: You’re Doing It Wrong

Aysylu Greenberg @aysylu22

benchmarking (ricon 2014)

Software

wrongaboutthemachine

onmethodbymaladereth

prod powermodechanges

g61ac9d8 ubuntu12