understanding slowness
DESCRIPTION
One of the dying skill sets in today’s engineering teams is the multi-disciplinary analyst that can truly dissect dysfunction in the radically complex architectures of today. As tools emerge that connect the dots, it might be faster to collect the data needed to analysis and decision making, but the knowledge and techniques to actually make the assessments needed are hard to come by. In this session, we’ll walk through a complex architecture and discuss what an engineer in this role really needs to understand. We’ll analyze a few anecdotal problems and see why this world of magical automation and elastic deployments will never really displace the need for root on a production box, a debugger, and the ability to move fast, take risks and destroy performance problems.TRANSCRIPT
![Page 1: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/1.jpg)
Slow is the new down.
Understanding Slowness
When shit goes wrong, the gloves come off.
![Page 2: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/2.jpg)
Goals
❖ Approach an understanding of your architecture,
❖ Convert this understanding into a strategic plan
❖ Develop logistics for diagnosis
❖ Discuss discipline around remediation
![Page 3: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/3.jpg)
The first step
Understand
Build a map !Build two !!“If you don’t have a good map of your architecture, Dora will whoop you.”
-Theo
![Page 4: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/4.jpg)
How you’d like to think of
Your architectureElegant Beautiful in its simplicityRobust Resilient
![Page 5: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/5.jpg)
When in actuality
YourArchitecture
is !Organically grown Cancerous tumors Disaster waiting to happen Hella complicated !of which you are Inexplicably proud
Photograph courtesy of Herman Rhoids
![Page 6: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/6.jpg)
Map #1
High-level mapArchitectural componentsConnectedness Data flow
![Page 7: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/7.jpg)
Map #2
Low-level mapComponent versions Component languages OS/NICs/HBAs Location Switches/Routers/FW Connected Service details
![Page 8: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/8.jpg)
Develop
Strategic PlanThere are 2 types useful SREs: !Spanning several boundaries !Spanning all boundaries
Photograph courtesy of Tambako The Jaguar https://www.flickr.com/photos/tambako/4598642399
![Page 9: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/9.jpg)
You can’t play ball without bases.
Who’s on first?
Establish who is responsible for each component in each context. !Establish who is responsible when that person fails (upward). !Establish who is responsible when that person needs help (upward and downward)
![Page 10: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/10.jpg)
Nothing will ever be “broken” if it isn’t expected to “work.”
Expectations
Set expectations forbreakages and slowdowns. !What you build will break, understanding under what stress is your job as an engineer.
![Page 11: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/11.jpg)
Parts are parts.
Ø tech loyalty
Constructing a solution from parts. !Parts are replaceable. !Have a list of replacement vendors of part alternates. !If you design a solution relying on a part available only from a single vendor, you have accomplished lock-in.
Photograph courtesy of Jason Ilagan https://www.flickr.com/photos/thepen/428014152
![Page 12: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/12.jpg)
When things are broken (or slow)
Logistics matterObservability !Tool parity !Safety harnesses
![Page 13: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/13.jpg)
You cannot improve what you cannot measure
Measure
Cut once
Rear Admiral Grace Murray Hopper 1906-1992
![Page 14: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/14.jpg)
The one beast you cannot slay:
LatencyYou must subdue itFirst you must understand it
![Page 15: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/15.jpg)
Averages are for chumps
Histograms over Aggregations
Reducing many observations S to N values (∀ |N| << |S|) is the definition of lossy. !or… “you don’t know shit”
![Page 16: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/16.jpg)
Exploring quantiles is simple and can provide increased understanding.
QuantilesTime-series histograms are a lot of information to digest. !Moving quantiles can often provide much more insight.
![Page 17: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/17.jpg)
Remember that you’re consolidating time.
Granular dataTime consolidation is needed. !It can be misleading. !Ask good statistical questions.
![Page 18: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/18.jpg)
Knowing your q(0.99) is “too high” is one thing…
Work backwardsWork backwards. !At what quantile are you?
![Page 19: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/19.jpg)
mvalue: http://www.brendangregg.com/FrequencyTrails/modes.html
Understand Workloads
![Page 20: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/20.jpg)
man(1) is a tool’s tool.
Tools
Tools do not a master craftsman make. !Regardless, know your damn tools. !There are three types of tools.
Photograph courtesy of James Bowe https://www.flickr.com/photos/jamesrbowe/7164489201
![Page 21: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/21.jpg)
Tool type #1
ObservationTaking measurements. !Inspecting state. !Inspecting conversation.
Photograph courtesy of Gordon Wrigley https://www.flickr.com/photos/tolomea/4196160169
![Page 22: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/22.jpg)
Tool type #2
Synthesis Synthesizes something to enable the use of tool type #1
Photograph courtesy of Simon Yao https://www.flickr.com/photos/smjb/8107539280
![Page 23: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/23.jpg)
Tool type #3
ManipulationChanging state. !Used for testing hypotheses.
Photograph courtesy of DragonFlyCC https://www.flickr.com/photos/ladydragonflyherworld/4299545598
![Page 24: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/24.jpg)
Favorite tools
Martial Arts
• DTrace • truss/ktrace/strace • tcpdump/snoop • mdb/gdb/dbx/lldb • sar/mpstat/iostat/vmstat !
• curl !
• vi/echo • sysctl/mdb(-w) • DTrace(-w)
#1
#2
#3
Photograph courtesy of Republic of Korea https://www.flickr.com/photos/koreanet/6099430458
![Page 25: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/25.jpg)
Lorem Ipsum Dolor Indeed
Anecdotes This one time at band camp
Photograph courtesy of umjanedoan https://www.flickr.com/photos/umjanedoan/497411169
![Page 26: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/26.jpg)
Latency
I’m huge in Japan
Latency for a hot landing page jumps from around 300ms to around 450ms. !No changes in latency to other regions.
![Page 27: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/27.jpg)
Latency
Scrub in or go homeLatency for disk writes radically change behavior. !It’s as if we have a new workload. !We do not have a new workload. !… we do have a new workload. !
Photograph courtesy of Phalinn Ooi https://www.flickr.com/photos/umjanedoan/497411169
![Page 28: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/28.jpg)
Latent effect
Hitting the wall
Disk I/O latency goes to hell at 3pm. !Turns out disk throughput is plateaued. !No change in configuration near 3pm. !Oops, I tripped at 10am.
Illustration courtesy of Jeff Warren https://www.flickr.com/photos/jeffreywarren/354553098
![Page 29: Understanding Slowness](https://reader033.vdocuments.site/reader033/viewer/2022051611/54b7b67a4a7959bf688b4711/html5/thumbnails/29.jpg)
Thank You