i volunteer as tribute: the future of oncall (uptime)
TRANSCRIPT
@bridgetkromhout
I volunteer as tribute
the future of
oncall
@bridgetkromhout
lives: Minneapolis,
Minnesota
works: Pivotal
podcasts: Arrested DevOps
organizes: devopsdays
Bridget Kromhout
@bridgetkromhout
traded oncall… …for more travel (similar effect on sleep)
@bridgetkromhout
things fall apart
@bridgetkromhout
“In a world that celebrates pioneers— be the settlers instead.”
— Laura Bell (@lady_nerd)
@bridgetkromhout
previously, on #opslife…
@bridgetkromhout
Image credit: James Ernest
@bridgetkromhout Image credit: 00abstrahiert99 on Flickr
…but #opslife means I’m a
cynical realist
@bridgetkromhout
@bridgetkromhout
@bridgetkromhout
Attack Kitten is
skeptical about
NoOps
@bridgetkromhout
Attack Kitten Cat Reality Check
@bridgetkromhout
empathy
@bridgetkromhout
@bridgetkromhout
serverless(in the brave new cloudy-with-a-chance-of-containers world)
@bridgetkromhout
serverless(in the brave new cloudy-with-a-chance-of-containers world)
@bridgetkromhout
two-pizza silo
@bridgetkromhout Image credit: Wikipedia
“Any organization that designs a system… will produce a design
whose structure is a copy of the organization's
communication structure.”
Mel Conway
@bridgetkromhout
Image credit: Vasa Museet
probably fine
@bridgetkromhout
in a perfect world
@bridgetkromhout
for ops, don’t tell devs: gl;hf!
do: automate document
share
@bridgetkromhout
for devs, build for operability:
observability, debuggability, reality
@bridgetkromhout
The Wall of Confusion
@bridgetkromhout
The Wall of Confusion
yolo nope
@bridgetkromhout
Image credit: wikimedia
@bridgetkromhout
"The past is never dead. It's not even past.” William Faulkner
@bridgetkromhout
limited custom dev; network incidents
oncall handled by ops only
Image credit: Wallpaper Up
@bridgetkromhout
limited custom dev; colo incidents
oncall handled by ops only
@bridgetkromhout
low trust; difficult to grant partial access
oncall handled by ops only
@bridgetkromhout
everyone’s on call!!1!
high trust; variable ability
Image credit: Robot Unicorn Attack 2
@bridgetkromhout
ops on call; devs available
building trust; variable visibility
@bridgetkromhout
shared oncall; branching decision tree
follow-the-sun if possible
@bridgetkromhout
oncall investments architecture observability
culture
@bridgetkromhout
@bridgetkromhout
keep on shipping (implementation details vary)
@bridgetkromhout
tree failure?!?
@bridgetkromhout
@bridgetkromhout
architecture: plan for continuous partial failure
@bridgetkromhout
CA
CP AP
AvailabilityConsistency
Partition Tolerance
“a partition is a time bound
on communication.”Eric Brewer
@bridgetkromhout
observability: answering questions we didn’t know to ask
@bridgetkromhout
observability: understand your environment
@bridgetkromhout
monitoring: the old way
@bridgetkromhout
Monitorin
g
monitoring: the new way
@bridgetkromhout
The business:
UX data for product & engineering Measure value delivered
Information Technology:
Visibility into state and failures Product & engineering decisions
Measure success of projects
monitoring needs of…
The Art of Monitoring (2016) James Turnbull
artofmonitoring.com
@bridgetkromhout
culture of collaboration
@bridgetkromhout
a tranquil beach… or is it?
@bridgetkromhout
@bridgetkromhout
@bridgetkromhout
learning culture: be adaptable
@bridgetkromhout
Computers are easy; people are hard
@bridgetkromhout
Massively scalable fault-tolerant distributed systems require a
significant engineering effort to build and operate; complex socio-technical systems are even more challenging.
Computers are easy; people are hard
@bridgetkromhout
Who owns your availability? The answer may surprise you!
Image credit: Wikipedia
@bridgetkromhout
not actually 20 units of devops
@bridgetkromhout
silos are for grain
@bridgetkromhout
@bridgetkromhout
still computers
oncall blood and tears don’t scale
@bridgetkromhoutgif credit: @paddyforan
oncall blood and tears don’t scale
@bridgetkromhoutgif credit: @paddyforan
@bridgetkromhout
don’t volunteer as tribute
@bridgetkromhout
don’t volunteer as tribute
invest in architecture, observability, culture
@bridgetkromhout
@bridgetkromhout
,