benchmarking (ricon 2014)

60
Benchmarking: Y oure Doing It Wrong Aysylu Greenberg @aysylu22

Upload: aysylu-greenberg

Post on 14-Jun-2015

587 views

Category:

Software


1 download

DESCRIPTION

Knowledge of how to set up good benchmarks is invaluable in understanding performance of the system. Writing correct and useful benchmarks is hard, and verification of the results is difficult and prone to errors. When done right, benchmarks guide teams to improve the performance of their systems. When done wrong, hours of effort may result in a worse performing application, upset customers or worse! In this talk, we will discuss what you need to know to write better benchmarks for distributed systems. We will look at examples of bad benchmarks and learn about what biases can invalidate the measurements, in the hope of correctly applying our new-found skills and avoiding such pitfalls in the future.

TRANSCRIPT

Page 1: Benchmarking (RICON 2014)

Benchmarking:  You’re Doing It Wrong

Aysylu  Greenberg  @aysylu22  

Page 2: Benchmarking (RICON 2014)
Page 3: Benchmarking (RICON 2014)

To  Write  Good  Benchmarks…  

Need  to  be  Full  Stack  

Page 4: Benchmarking (RICON 2014)

   

your  process  vs  Goal  your  process  vs  Best  PracCces  

 

Benchmark  =  How  Fast?  

Page 5: Benchmarking (RICON 2014)

Today  

•  How  Not  to  Write  Benchmarks  •  Benchmark  Setup  &  Results:  -   You’re  wrong  about  machines  -   You’re  wrong  about  stats  -   You’re  wrong  about  what  maLers  

•  Becoming  Less  Wrong  •  Having  Fun  with  Riak  

Page 6: Benchmarking (RICON 2014)

HOW  NOT  TO  WRITE  BENCHMARKS  

Page 7: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  environment  

Web  Request  

Server  

S3  Cache  

Page 8: Benchmarking (RICON 2014)

WHAT’S  WRONG  WITH  THIS  BENCHMARK?    

Page 9: Benchmarking (RICON 2014)

YOU’RE  WRONG  ABOUT  THE  MACHINE    

Page 10: Benchmarking (RICON 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  

Page 11: Benchmarking (RICON 2014)

It’s  Caches  All  The  Way  Down  

Web  Request  

Server  

S3  Cache  

Page 12: Benchmarking (RICON 2014)

It’s  Caches  All  The  Way  Down  

Page 13: Benchmarking (RICON 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 14: Benchmarking (RICON 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 15: Benchmarking (RICON 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 16: Benchmarking (RICON 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 17: Benchmarking (RICON 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 18: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  environment  

Web  Request  

Server  

S3  Cache  

Page 19: Benchmarking (RICON 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Cming  

Page 20: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  environment  

Web  Request  

Server  

S3  Cache  

Page 21: Benchmarking (RICON 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Cming  •  Periodic  interference  

Page 22: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  environment  

Web  Request  

Server  

S3  Cache  

Page 23: Benchmarking (RICON 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Cming  •  Periodic  interference  •  Test  !=  Prod  

Page 24: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  environment  

Web  Request  

Server  

S3  Cache  

Page 25: Benchmarking (RICON 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Cming  •  Periodic  interference  •  Test  !=  Prod  •  Power  mode  changes  

Page 26: Benchmarking (RICON 2014)

YOU’RE  WRONG  ABOUT  THE  STATS    

Page 27: Benchmarking (RICON 2014)

Wrong  About  Stats  

•  Too  few  samples    

Page 28: Benchmarking (RICON 2014)

Wrong  About  Stats  

0  

20  

40  

60  

80  

100  

120  

0   10   20   30   40   50   60  

Latency  

Time  

Convergence  of  Median  on  Samples  

Stable  Samples  

Stable  Median  

Decaying  Samples  

Decaying  Median  

Page 29: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Page 30: Benchmarking (RICON 2014)

Wrong  About  Stats  

•  Too  few  samples  •  Gaussian  (not)  

Page 31: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Page 32: Benchmarking (RICON 2014)

Wrong  About  Stats  

•  Too  few  samples  •  Gaussian  (not)  •  MulCmodal  distribuCon  

Page 33: Benchmarking (RICON 2014)

MulCmodal  DistribuCon  

50%  99%  

#  occurren

ces  

Latency   5  ms   10  ms  

Page 34: Benchmarking (RICON 2014)

Wrong  About  Stats  

•  Too  few  samples  •  Gaussian  (not)  •  MulCmodal  distribuCon  •  Outliers  

Page 35: Benchmarking (RICON 2014)

YOU’RE  WRONG  ABOUT  WHAT  MATTERS    

Page 36: Benchmarking (RICON 2014)

Wrong  About  What  MaLers  

•  Premature  opCmizaCon  

Page 37: Benchmarking (RICON 2014)

“Programmers  waste  enormous  amounts  of  Cme  thinking  about  …  the  speed  of  noncriCcal  parts  of  their  programs  ...  Forget  about  small  efficiencies  …97%  of  the  Cme:  premature  opHmizaHon  is  the  root  of  all  evil.  Yet  we  should  not  pass  up  our  opportuniCes  in  that  criCcal  3%.”    

-­‐-­‐  Donald  Knuth  

Page 38: Benchmarking (RICON 2014)

Wrong  About  What  MaLers  

•  Premature  opCmizaCon  •  UnrepresentaCve  workloads  

Page 39: Benchmarking (RICON 2014)

Wrong  About  What  MaLers  

•  Premature  opCmizaCon  •  UnrepresentaCve  workloads  •  Memory  pressure  

Page 40: Benchmarking (RICON 2014)

Wrong  About  What  MaLers  

•  Premature  opCmizaCon  •  UnrepresentaCve  workloads  •  Memory  pressure  •  Load  balancing  

Page 41: Benchmarking (RICON 2014)

Wrong  About  What  MaLers  

•  Premature  opCmizaCon  •  UnrepresentaCve  workloads  •  Memory  pressure  •  Load  balancing  •  Reproducibility  of  measurements  

Page 42: Benchmarking (RICON 2014)

BECOMING  LESS  WRONG  

Page 43: Benchmarking (RICON 2014)

User  AcCons  MaLer    

X  >  Y  for  workload  Z  with  trade  offs  A,  B,  and  C  

-­‐  hLp://www.toomuchcode.org/  

Page 44: Benchmarking (RICON 2014)

Profiling  Code  instrumentaCon  Aggregate  over  logs  Traces    

Page 45: Benchmarking (RICON 2014)

Microbenchmarking:  Blessing  &  Curse  

+ Quick  &  cheap  + Answers  narrow  ?s  well  - Osen  misleading  results  - Not  representaCve  of  the  program  

Page 46: Benchmarking (RICON 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely    

Page 47: Benchmarking (RICON 2014)

Choose  Your  N  Wisely  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 48: Benchmarking (RICON 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  

Page 49: Benchmarking (RICON 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  •  Beware  of  clock  resoluCon  

Page 50: Benchmarking (RICON 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  •  Beware  of  clock  resoluCon  •  Dead  Code  EliminaCon  

Page 51: Benchmarking (RICON 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  •  Beware  of  clock  resoluCon  •  Dead  Code  EliminaCon  •  Constant  work  per  iteraCon  

Page 52: Benchmarking (RICON 2014)

Non-­‐Constant  Work  Per  IteraCon  

Page 53: Benchmarking (RICON 2014)

Follow-­‐up  Material  

•  How  NOT  to  Measure  Latency  by  Gil  Tene  –  hLp://www.infoq.com/presentaCons/latency-­‐piualls  

•  Taming  the  Long  Latency  Tail  on  highscalability.com  –  hLp://highscalability.com/blog/2012/3/12/google-­‐taming-­‐the-­‐long-­‐latency-­‐tail-­‐when-­‐more-­‐machines-­‐equal.html  

•  Performance  Analysis  Methodology  by  Brendan  Gregg  –  hLp://www.brendangregg.com/methodology.html  

•  Silverman’s  Mode  Detec@on  Method  by  MaL  Adereth  –  hLp://adereth.github.io/blog/2014/10/12/silvermans-­‐mode-­‐detecCon-­‐method-­‐explained/  

Page 54: Benchmarking (RICON 2014)

HAVING  FUN  WITH  

Page 55: Benchmarking (RICON 2014)

Setup      

•  SSD  30  GB  •  M3  large  •  Riak  version  1.4.2-­‐0-­‐g61ac9d8  •  Ubuntu  12.04.5  LTS  •  4  byte  keys,  10  KB  values  

Page 56: Benchmarking (RICON 2014)

1850  

1900  

1950  

2000  

2050  

2100  

2150  

2200  

2250  

2300  

2350  

Latency  (usec)  

Number  of  Keys  

Get  Latency  

L3  

Page 57: Benchmarking (RICON 2014)

Takeaway  #1:  Cache  

Page 58: Benchmarking (RICON 2014)

Takeaway  #2:  Outliers  

Page 59: Benchmarking (RICON 2014)

Takeaway  #3:  Workload  

Page 60: Benchmarking (RICON 2014)

Benchmarking:  You’re Doing It Wrong

Aysylu  Greenberg  @aysylu22