scalability is quantifiable - usenix · the usl is a mathematical definition of scalability it’s...
TRANSCRIPT
![Page 1: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/1.jpg)
@xaprb
Scalability is QuantifiableUniversal Scalability Law
Baron Schwartz - November 2017
![Page 2: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/2.jpg)
@xaprb
Logistics & StuffSlides will be posted :)Ask questions anytime!
Founder of VividCortexWrote High Performance MySQL
Love to hear from you: @xaprb and [email protected]
2
![Page 3: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/3.jpg)
@xaprb
How Systems Fail Under Load
You’ve seen systems become sluggish under high load
How can we describe and reason about what’s happening?
3
![Page 4: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/4.jpg)
@xaprb
Failure Boundaries
Cook and Rasmussen describe failure boundaries around the operating domain
One such is the unacceptable workload boundary
4
![Page 5: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/5.jpg)
@xaprb
Workload Failure Isn’t CrispUnacceptable workload is not sharply defined, it’s a gradient
Cook lists 18 precepts of system failure in “How Complex Systems Fail”
#5: Complex systems run in degraded mode
5
![Page 6: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/6.jpg)
@xaprb
Workload Failure Isn’t CrispCook introduces error margin. What’s the workload margin?
What if you drift into it?
6
![Page 7: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/7.jpg)
@xaprb
The Failure Boundary Is NonlinearThis region is highly nonlinear and unintuitive
It’s analogous to post-elastic material behavior
7
![Page 8: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/8.jpg)
@xaprb
Capacity
Systems can, and do, function beyond their capacity limits.
Capacity limits are scalability limits.
How can we define and reason about system capacity?
Ditto, for scalability?
8
![Page 9: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/9.jpg)
@xaprb
Queueing Theory
There’s a branch of operations research called queueing theory
It analyzes what happens to customers when systems get busy
Difficult to apply in “the real world” of capacity & ops
9
![Page 10: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/10.jpg)
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
2025
Utilization
Res
iden
ce T
ime
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
2025
The hockey stick curve is difficult to
use in practicevery nonlinear and
hard for humans to intuit
![Page 11: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/11.jpg)
@xaprb
0 2 4 6 8 10
05000
15000
nodes
throughput
Scaling A System: IdealSuppose a clustered system can do X work per unit of time
Ideally, if you double the cluster size, it can do 2X work
11
![Page 12: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/12.jpg)
@xaprb
0 2 4 6 8 10
05000
15000
nodes
throughput
EquationThe linear scalability equation:
where 𝜆 is the slope of the line
12
![Page 13: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/13.jpg)
@xaprb
But Our Cluster Isn’t PerfectSpeedup by executing tasks in parallel, e.g. ~ scatter-gather
What happens to performance if some portion isn’t parallelizable?
13
Execution Time Speedup!
![Page 14: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/14.jpg)
@xaprb
0 2 4 6 8 10
05000
15000
nodes
throughput
0 2 4 6 8 10
05000
15000
nodes
throughput
Amdahl’s LawAmdahl’s Law describes the fraction 𝜎 that can’t be done in parallel
Adding nodes provides some speedup, but there’s a ceiling
14
![Page 15: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/15.jpg)
@xaprb
But What If Workers Coordinate?Suppose the parallel workers have dependencies on each other?
15
![Page 16: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/16.jpg)
@xaprb
N Workers = N(N-1) Pairs
16
![Page 17: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/17.jpg)
@xaprb 17
Represent crosstalk (coherence) penalty by coefficient 𝜅
The system get less work done as it gets more load!
0 2 4 6 8 10
05000
15000
nodes
throughput
0 2 4 6 8 10
05000
15000
nodes
throughput
Universal Scalability Law
![Page 18: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/18.jpg)
@xaprb
Crosstalk Penalty Grows Fast
18
𝜅
𝜎
when we reach saturation, 𝜅 is growing very rapidly, againcreating very nonlinear behavior
![Page 19: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/19.jpg)
@xaprb
More About Crosstalk
Q: Isn’t crosstalk just a design flaw?
A: Yes and no. Real-life: consensus, 2-phase commit, NUMA, etc…
Q: Doesn’t it seem odd to assume that crosstalk is a constant?
A: It’s not, the amount of crosstalk-related work is a function of N
19
![Page 20: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/20.jpg)
@xaprb
How Do You Measure Parameters?How can you measure how much serialization/crosstalk you have?
You don’t — USL is black-box. Measure the things on the axes… (cont’d)
20
![Page 21: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/21.jpg)
@xaprb
How Do You Measure Parameters?…Then use regression (least-squares curve fitting) to estimate the parameters of the
equation. This lets you figure out 𝜎 and 𝜅 without needing to be able to measure.
21
![Page 22: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/22.jpg)
@xaprb
Experiment Interactivelydesmos.com/calculator/3cycsgdl0b
![Page 23: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/23.jpg)
@xaprb
What is Scalability?The USL is a mathematical definition of scalability
It’s a function that turns workload into throughput
It’s formally derived and has real physical meaning
23
![Page 24: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/24.jpg)
@xaprb
But What Is Load?In most circumstances we care about, load is concurrency
Concurrency is the number of requests in progress
It’s surprisingly easy to measure: sum(latency)/interval
Many systems emit it as telemetry
• MySQL: SHOW STATUS LIKE ‘Threads_running’
• Apache: active worker count
24
![Page 25: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/25.jpg)
Four Great Uses Of The USL
![Page 26: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/26.jpg)
@xaprb
0 20 40 60 80
05000
10000
15000
Size
Throughput
1. Forecast Workload Failure Boundary
The USL can reveal the workload failure boundary approaching
26
![Page 27: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/27.jpg)
@xaprb
0 10 20 30 40 50 60
04000
8000
12000
Size
Throughput
1. Forecast Workload Failure BoundaryYou can use regression to extract the coefficients, then plot
Or pot and eyeball to see if you’re getting near the edge
27
![Page 28: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/28.jpg)
@xaprb
1. Forecast Workload Failure BoundaryCoda Hale wrote a thing about the USL
https://codahale.com/usl4j-and-you/
28
![Page 29: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/29.jpg)
@xaprb
1. Forecast Workload Failure Boundary
• By estimating the parameters, you can forecast what you can’t see
• This means you can “load test” under load you don’t yet experience
• The USL is a pessimistic model, so you should expect better
• The USL is pessimistic, but you should be more pessimistic
29
![Page 30: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/30.jpg)
@xaprb
2. Characterize Non-Scalability
Why doesn’t your system scale perfectly?
The USL reveals amount of serialization vs crosstalk
30
![Page 31: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/31.jpg)
@xaprb
2. Characterize Non-ScalabilityPaypal’s NodeJS vs Java benchmarks are a good example!
31
https://www.vividcortex.com/blog/2013/12/09/analysis-of-paypals-node-vs-java-benchmarks/
![Page 32: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/32.jpg)
@xaprb
3. How Scalable SHOULD It Be?
The USL is a framework for making systems look really bad
Many 10+ node MPP databases barely do anything per-node
Calculate per-node a) clients b) data size c) throughput
One 18-node database: 4000 QPS ~220 QPS/node, 5ms latency
32
![Page 33: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/33.jpg)
@xaprb
3. How Scalable SHOULD It Be?This is an animation of how Citus’s distributed database worksFor the record: Citus isn’t one of the terribly unscalable DB’s
33
![Page 34: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/34.jpg)
@xaprb
4. See Your Teams As Systems
34
![Page 35: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/35.jpg)
@xaprb
4. See Your Teams As Systems
“To go fast, go alone. To go far, go together.”Adrian Colyer wrote a good blog post about teams-as-systems and USL
https://blog.acolyer.org/2015/04/29/applying-the-universal-scalability-law-to-organisations/
35
![Page 36: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/36.jpg)
@xaprb
4. See Your Teams As Systems
The USL isn’t novel in that sense…
36
![Page 37: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/37.jpg)
@xaprb
What Else Can The USL Illuminate?
Open-plan offices: My work takes more work when others are nearby
Map-Reduce: That’s a whole lotta overhead, but it sure is scalable
Mutexes: Theoretically just serialize, but those damn OS schedulers
37
![Page 38: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/38.jpg)
@xaprb
0 5000 10000 15000
0.000
0.002
0.004
Throughput
Latency
What’s NOT Scalability?I commonly see throughput-vs-latency charts
This seems legit till you get systems under high load
38
![Page 39: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/39.jpg)
@xaprb
Scalability Isn’t Throughput-vs-Latency
The throughput-vs-latency equation has two solutions
39
![Page 40: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/40.jpg)
@xaprb
0 5 10 15 20 25 30
0.00
120.
0016
0.00
20
Concurrency
Res
pons
e Ti
me
Concurrency-vs-Latency is OKIt’s a simple quadratic per Little’s Law, and is quite useful
40
![Page 41: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/41.jpg)
@xaprb
Some ResourcesI wrote a book.
I created an Excel sheet.
41
![Page 42: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/42.jpg)
@xaprb
Conclusions
Scalability is formally definable, and black-box observable
Scalability is nonlinear; this region is the failure boundary
Scalability is a function with parameters you can estimate
42
![Page 43: Scalability is Quantifiable - USENIX · The USL is a mathematical definition of scalability It’s a function that turns workload into throughput It’s formally derived and has](https://reader033.vdocuments.site/reader033/viewer/2022041921/5e6befdd9afcc3406e0a57a4/html5/thumbnails/43.jpg)
@xaprb
Further Reading/References
• https://www.vividcortex.com/resources/ for ebook, Excel worksheet
• http://www.perfdynamics.com/Manifesto/USLscalability.html for the original source
43