rollback: the impossible dream
DESCRIPTION
Roll back doesn’t exist. It’s not real. It’s a fantasy, a dream, a delusion. Any vendor who tells you they have a roll back capability is lying to you. And lying to you in a downright dangerous way that will come back to haunt you at 4am in a war room when someone says:“We can’t fix this. Let’s roll back the deployment.”This talk is designed to explain and demonstrate to Operations staff:Why roll back is a fantasy and explained with a dash of Werner HeisenbergWhy it is dangerous and how you can recognize when you’re about to get trappedHow you can avoid falling into that trap of considering it an appropriate compensating control.It’ll also explain what you can actually do operationally instead of “rolling back”. This will cover other alternative compensating controls that can help you get running again and resolve your outage whilst still allowing you to find root cause.TRANSCRIPT
RollbackThe Impossible Dream
by James Turnbull
jamtur01 @ githubkartar @ twitter
jamesturnbull on freenodejames @ puppetlabs.com
About Me
VP Technical Operations at Puppet Labs
Puppet guy
Ruby guy
Talks funny
A show of hands
Who thinks they know what rollback
is?
Last set of hands
YMMV
Definitions
Traditional
Modern
Fact or Fiction?
Accept certain constraints
Constraint #1Apply sufficient
capital
Constraint #2Idempotent
Constraint #3Cascade-less
failure
Constraint #4Resources
A Philosophical Digression
If I know where I amI don’t know how I got there
If I know how I got thereI don’t know where I am
Very few “systems” are
truly deterministic
A Mathematical Digression
On system rollback and totalised fieldsAn algebraic approach to system change
Mark Burgess and Alva Couch20th June 2011
http://cfengine.com/markburgess/papers/totalfield.pdf
So what’s wrong with rollback?
Risk
Learning from mistakes
Complex systems are
… complex
Human error
What is the problem rollback is
trying to solve?
What is the problem YOU are trying to solve?
So how can we mitigate Rollback
shortcomings?
PreventativeDesign
Rollback is (often) an architecture
problem
Increase Resilience
Operational Intelligence
A little bit of DevOps in every byte…
Small, iterative changes
Accept that failure happens
“We can’t test that? Okay we can
roll it back if it breaks…”
Assumption is the mother of all fuckups*
“But the system can’t be {run|upgraded|deployed} like that
because…”
Conclusions
Rollback is possible but not probable
If you have to have “rollback” accept
constraints
You can mitigate the need for it
Thank you!
Questions/Insults?
jamtur01 @ githubkartar @ twitter
jamesturnbull on freenodejames @ puppetlabs.com