Transcript
Page 1: Benchmarking (RICON 2014)

Benchmarking:  You’re Doing It Wrong

Aysylu  Greenberg  @aysylu22  

Page 2: Benchmarking (RICON 2014)
Page 3: Benchmarking (RICON 2014)

To  Write  Good  Benchmarks…  

Need  to  be  Full  Stack  

Page 4: Benchmarking (RICON 2014)

   

your  process  vs  Goal  your  process  vs  Best  PracCces  

 

Benchmark  =  How  Fast?  

Page 5: Benchmarking (RICON 2014)

Today  

•  How  Not  to  Write  Benchmarks  •  Benchmark  Setup  &  Results:  -   You’re  wrong  about  machines  -   You’re  wrong  about  stats  -   You’re  wrong  about  what  maLers  

•  Becoming  Less  Wrong  •  Having  Fun  with  Riak  

Page 6: Benchmarking (RICON 2014)

HOW  NOT  TO  WRITE  BENCHMARKS  

Page 7: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  environment  

Web  Request  

Server  

S3  Cache  

Page 8: Benchmarking (RICON 2014)

WHAT’S  WRONG  WITH  THIS  BENCHMARK?    

Page 9: Benchmarking (RICON 2014)

YOU’RE  WRONG  ABOUT  THE  MACHINE    

Page 10: Benchmarking (RICON 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  

Page 11: Benchmarking (RICON 2014)

It’s  Caches  All  The  Way  Down  

Web  Request  

Server  

S3  Cache  

Page 12: Benchmarking (RICON 2014)

It’s  Caches  All  The  Way  Down  

Page 13: Benchmarking (RICON 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 14: Benchmarking (RICON 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 15: Benchmarking (RICON 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 16: Benchmarking (RICON 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 17: Benchmarking (RICON 2014)

Caches  in  Benchmarks  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 18: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  environment  

Web  Request  

Server  

S3  Cache  

Page 19: Benchmarking (RICON 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Cming  

Page 20: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  environment  

Web  Request  

Server  

S3  Cache  

Page 21: Benchmarking (RICON 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Cming  •  Periodic  interference  

Page 22: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  environment  

Web  Request  

Server  

S3  Cache  

Page 23: Benchmarking (RICON 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Cming  •  Periodic  interference  •  Test  !=  Prod  

Page 24: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  environment  

Web  Request  

Server  

S3  Cache  

Page 25: Benchmarking (RICON 2014)

Wrong  About  the  Machine  

•  Cache,  cache,  cache,  cache!  •  Warmup  &  Cming  •  Periodic  interference  •  Test  !=  Prod  •  Power  mode  changes  

Page 26: Benchmarking (RICON 2014)

YOU’RE  WRONG  ABOUT  THE  STATS    

Page 27: Benchmarking (RICON 2014)

Wrong  About  Stats  

•  Too  few  samples    

Page 28: Benchmarking (RICON 2014)

Wrong  About  Stats  

0  

20  

40  

60  

80  

100  

120  

0   10   20   30   40   50   60  

Latency  

Time  

Convergence  of  Median  on  Samples  

Stable  Samples  

Stable  Median  

Decaying  Samples  

Decaying  Median  

Page 29: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Page 30: Benchmarking (RICON 2014)

Wrong  About  Stats  

•  Too  few  samples  •  Gaussian  (not)  

Page 31: Benchmarking (RICON 2014)

Website  Serving  Images  

•  Access  1  image  1000  Cmes  •  Latency  measured  for  each  access  •  Start  measuring  immediately  •  3  runs  •  Find  mean  •  Dev  machine  

Web  Request  

Server  

S3  Cache  

Page 32: Benchmarking (RICON 2014)

Wrong  About  Stats  

•  Too  few  samples  •  Gaussian  (not)  •  MulCmodal  distribuCon  

Page 33: Benchmarking (RICON 2014)

MulCmodal  DistribuCon  

50%  99%  

#  occurren

ces  

Latency   5  ms   10  ms  

Page 34: Benchmarking (RICON 2014)

Wrong  About  Stats  

•  Too  few  samples  •  Gaussian  (not)  •  MulCmodal  distribuCon  •  Outliers  

Page 35: Benchmarking (RICON 2014)

YOU’RE  WRONG  ABOUT  WHAT  MATTERS    

Page 36: Benchmarking (RICON 2014)

Wrong  About  What  MaLers  

•  Premature  opCmizaCon  

Page 37: Benchmarking (RICON 2014)

“Programmers  waste  enormous  amounts  of  Cme  thinking  about  …  the  speed  of  noncriCcal  parts  of  their  programs  ...  Forget  about  small  efficiencies  …97%  of  the  Cme:  premature  opHmizaHon  is  the  root  of  all  evil.  Yet  we  should  not  pass  up  our  opportuniCes  in  that  criCcal  3%.”    

-­‐-­‐  Donald  Knuth  

Page 38: Benchmarking (RICON 2014)

Wrong  About  What  MaLers  

•  Premature  opCmizaCon  •  UnrepresentaCve  workloads  

Page 39: Benchmarking (RICON 2014)

Wrong  About  What  MaLers  

•  Premature  opCmizaCon  •  UnrepresentaCve  workloads  •  Memory  pressure  

Page 40: Benchmarking (RICON 2014)

Wrong  About  What  MaLers  

•  Premature  opCmizaCon  •  UnrepresentaCve  workloads  •  Memory  pressure  •  Load  balancing  

Page 41: Benchmarking (RICON 2014)

Wrong  About  What  MaLers  

•  Premature  opCmizaCon  •  UnrepresentaCve  workloads  •  Memory  pressure  •  Load  balancing  •  Reproducibility  of  measurements  

Page 42: Benchmarking (RICON 2014)

BECOMING  LESS  WRONG  

Page 43: Benchmarking (RICON 2014)

User  AcCons  MaLer    

X  >  Y  for  workload  Z  with  trade  offs  A,  B,  and  C  

-­‐  hLp://www.toomuchcode.org/  

Page 44: Benchmarking (RICON 2014)

Profiling  Code  instrumentaCon  Aggregate  over  logs  Traces    

Page 45: Benchmarking (RICON 2014)

Microbenchmarking:  Blessing  &  Curse  

+ Quick  &  cheap  + Answers  narrow  ?s  well  - Osen  misleading  results  - Not  representaCve  of  the  program  

Page 46: Benchmarking (RICON 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely    

Page 47: Benchmarking (RICON 2014)

Choose  Your  N  Wisely  Prof.  Saman  Amarasinghe,  MIT  2009    

Page 48: Benchmarking (RICON 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  

Page 49: Benchmarking (RICON 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  •  Beware  of  clock  resoluCon  

Page 50: Benchmarking (RICON 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  •  Beware  of  clock  resoluCon  •  Dead  Code  EliminaCon  

Page 51: Benchmarking (RICON 2014)

Microbenchmarking:  Blessing  &  Curse  

•  Choose  your  N  wisely  •  Measure  side  effects  •  Beware  of  clock  resoluCon  •  Dead  Code  EliminaCon  •  Constant  work  per  iteraCon  

Page 52: Benchmarking (RICON 2014)

Non-­‐Constant  Work  Per  IteraCon  

Page 53: Benchmarking (RICON 2014)

Follow-­‐up  Material  

•  How  NOT  to  Measure  Latency  by  Gil  Tene  –  hLp://www.infoq.com/presentaCons/latency-­‐piualls  

•  Taming  the  Long  Latency  Tail  on  highscalability.com  –  hLp://highscalability.com/blog/2012/3/12/google-­‐taming-­‐the-­‐long-­‐latency-­‐tail-­‐when-­‐more-­‐machines-­‐equal.html  

•  Performance  Analysis  Methodology  by  Brendan  Gregg  –  hLp://www.brendangregg.com/methodology.html  

•  Silverman’s  Mode  Detec@on  Method  by  MaL  Adereth  –  hLp://adereth.github.io/blog/2014/10/12/silvermans-­‐mode-­‐detecCon-­‐method-­‐explained/  

Page 54: Benchmarking (RICON 2014)

HAVING  FUN  WITH  

Page 55: Benchmarking (RICON 2014)

Setup      

•  SSD  30  GB  •  M3  large  •  Riak  version  1.4.2-­‐0-­‐g61ac9d8  •  Ubuntu  12.04.5  LTS  •  4  byte  keys,  10  KB  values  

Page 56: Benchmarking (RICON 2014)

1850  

1900  

1950  

2000  

2050  

2100  

2150  

2200  

2250  

2300  

2350  

Latency  (usec)  

Number  of  Keys  

Get  Latency  

L3  

Page 57: Benchmarking (RICON 2014)

Takeaway  #1:  Cache  

Page 58: Benchmarking (RICON 2014)

Takeaway  #2:  Outliers  

Page 59: Benchmarking (RICON 2014)

Takeaway  #3:  Workload  

Page 60: Benchmarking (RICON 2014)

Benchmarking:  You’re Doing It Wrong

Aysylu  Greenberg  @aysylu22  


Top Related