fallacy of fast
TRANSCRIPT
the FallacyOf Fast
INES Sombra
@Randommood
Obligatory DisclaimerThings you will see in this talk
Fast Talking & Opinions TM
Un-tweetable moments
Rantifestos TM
Questionable language
A therapy llama
and ZERO Kevin Bacons
@Randommood
Common Mistakes
What Matters
@Randommood
SPOILER alert
@Randommood
@Randommood
Reason about design choices in terms of
trade-offs @Randommood
chosen trade-offs make the foundation
of your system@Randommood
@Randommood
@Randommood
@Randommood
@Randommood
Common Mistakes
Accidentally de-emphasizing
long term quality & system Stability
@Randommood
De-prioritizing TestingCutting corners on testing carries a hidden cost
Test the full system: client, code, & provisioning code
Code reviews != tests. Have both
Continuous Integration (CI) is critical to velocity, quality, & transparency
De-prioritizing ReleasesRelease stability is tied to system stability
Iron out your deploy process!
Dependencies on other systems make this even more important
Canary testing, dark launches, feature flags, etc are good
@Randommood
Automation shortcuts taken while in a rush will come back to haunt you
Playbooks are a must have
Localhost is the devil
Sloppy operational work is the mother of all evils
De-prioritizing Ops
!
@Randommood
“Future you monitoring” is bad, make it part of MVP
Alert fatigue has a high cost, don’t let it get that far
Link alerts to playbook
Routinely test your escalation paths
De-prioritizing Insight
✨
@Randommood
The inner workings of data components matter. Learn about them
System boundaries ought to be made explicit
Deprecate your go-to person
De-prioritizing Knowledge
@Randommood
The internet is an awful place
Expect DoS/DDoS
Think about your system, its connections, and their dependencies
Having the ability to turn off features/clients helps
De-prioritizing Security
Service ownership implies leveling-up operationally
Architectural choices made in a rush can have a long shelf life
Don’t sacrifice tests. Test the FULL system
What we learned
✨@Randommood
RANTIFESTO!
@Randommood
building shrines of AgileAssuming a given methodology will solve everything is naive at best
Magical thinking leads to misaligned expectations
All tools are terrible, avoid religious wars #
Anarchy
Complex
Complicated
Complicated
Simple
Clos
e to a
gree
ment
far
from
agre
emen
t
Close to certainty far from certainty
* “When to Scrum?” stolen from Angela Druckman
Requirements
Technology
Scrum
@Randommood
@Randommood
@Randommood
@Randommood
$%&%
'''''''' '
( (
@Randommood
“In truth a range of approaches, a hybrid mix, of management methods is
required to succeed in today's enterprise IT environment. That customer
enterprise environment never was like the simplified product development environment where Agile software development was conceived…”
@Randommood
“In truth a range of approaches, a hybrid mix, of management methods is
required to succeed in today's enterprise IT environment. That customer
enterprise environment never was like the simplified product development environment where Agile software development was conceived…”
@RandommoodDUH
Agile GotchasUncertainty in problem domain (and company size) will challenge your ability to adhere to it
Has a cost but it’s different
Nihilism FTW?
)?
@Randommood
WhatMatters
@Randommood
Mind system DesignSimple & utilitarian design takes you a long way
Use well understood components
NIH is a double edged sword
Use feature flags & on/off switches (test them!)
@Randommood
Meet AliceI’m way too
cool for this outfit
Alice’s Testing AreasCorrectness Error Performance Robustness
Good output from good
inputs
Reasonable reaction to incorrect
input
Time to Task (TTT) for Behavior after
Go
al Single node
Multi node
Clustered
Cache enabled
Given # of input/outputs
Given uptime
@Randommood
a Testing HarnessIs a fantastic thing to have
Invest in QA automation engineers
Adding support for regressions & domain-specific testing pays off
@Randommood
Mind system ConfigsSystem assumptions are dangerous, make them explicit
Standardize system configuration (data bags, config file, etc)
Hardcoding is the devil
Mind system LimitsRate limit your API calls especially if they are public or expensive to run
Instrument / add metrics to track them
Rank your services & data (what can you drop?)
Capacity analysis is not dead
✨
Mind system GrowthWatch out for initial over-architecting
“The application that takes you to 100k users is not the same one that takes you to 1M, and so on…” @netik
Expect changes & refactors
@Randommood
Mind ProcessArchitectural reviews FTW
Request flow, API shape, Failure conditions, Reliability, Data Model, Threat modeling, Testing strategy, Operations, Monitor logging & Alerting, Pricing/Billing, Supported clients, etc
@Randommood
Mind ResourcesRedundancies of resources, execution paths, checks, replication of data, replay of messages, anti-entropy build resilience
Mechanisms to guard system resources are good to have
Your system is also tied to the resources of its dependencies
Distrust is healthyDistrust client behavior, even if they are internal
Decisions have an expiration date. Periodically re-evaluate them as past you was much dumber
A revisionist culture produces more resilient systems
✨
@Randommood
about Resilience
Traditional engineering
Reactive ops unk-unk
* Stolen from Paul Borrill
Cascading or catastrophic failures & you don’t know where they will come from! Same area as other 2 combined
Probability of failure
Rank
classical Engineering
reactive Operations unk-unk
Building Resilience
Code standards
Programming patterns
Testing (full system!)
Metrics & monitoring
Convergence to good state
Hazard inventories
Redundancies
Feature flags
Dark deploys
Runbooks & docs
Canaries
System verification
Formal methods
Fault injection
The goal is to build failure domain independence
Keep track of your technical debt & repay it regularly
It’s about lowering the risk of change with tools & culture
Mind assumptions
What we learned
✨
@Randommood
TL;DR
Easy to sacrifice things may be harder to
correct later
Think in terms of tradeoffs
TESTING MATTERS!
Not all process is evil
Keep in MindMake system boundaries &
dependencies explicit
Playbooks are your friends, have them
Use kill switches & limits
Prioritize your services
Distributed systems
github.com/Randommood/FallacyOfFast
@Randommood