getting from folsom to grizzly - a devops upgrade pattern.pptx

From Folsom 2 GrizzlyA DevOps Upgrade PatternGreg AlthausDell, Principal Engineer

http://lifeatthebar.com

Getting our Bearings, Starting a Journey• The “Problem“ with Migration• Paths to Nirvana (or Roads to Perdition)• Alternatives• An Opinion• Discussion

http://learn.genetics.utah.edu/content/begin/cells/organelles/

F G

H

The Problem: The Bus is Rolling!• OpenStack has 3 month release major/minor cycle• Major version every 6 months• Minor version (but important) 3 & 6 months after release

• Lots of Changes• Bugs are fixed• Operating Systems upgrade• New technologies appear• Whole projects are split off

• We expect operators to• Keep systems running• Never loose data• And… Stay up to date

http://cdn2.arkive.orgsockeye-salmon-predated-by-grizzly-bear-on-migration-

upstream.jpg

So you want to Upgrade?I’ve got Questions!• What are we upgrading?

• OpenStack - Yes!• Dependent packages - Probably?• Base OS - Maybe?

• What is the state during the "in-between" time?• Infrastructure downtime?• VM downtime? VM Reboot? Controlled/Informed?• Availability Windows?

• What contingency plans?• Dry run? Maybe.• Recover by going backwards? Maybe.

• What level of safety and trust do you need?• Assure data integrity?• Assure Infrastructure Integrity?• Maintain Security?

• How long can the migration take?• Big bang move or gradual migrate?• How will my API consumers/ecosystem cope?• Can Keystone Grizzly work with Folsom Nova???• What about futures? G.1 to G.2? H to I?• Can I skip versions? Jump from G to I?

http://www.publicdomainpictures.netSteep Steps by Peter Griffin

Let’s Start at the Basics• Beginning Answers

• Distros will manage dependencies and packaging• We can’t lose data or compromise security• Infrastructure state and integrity will vary by solution

• Assumption of Staging• Some managed environment (not a manual deploy)• Staging/test environment to get "familiar" with the problem.• Maintenance window for production - limits scope of change• Step-wise changes are OK (big bang is not required)• We can make trade-offs to defray expensive requirements

• Beyond Assumptions… Paradigm Shifts• There are shared best practices• Upgrades can be automated in a sharable way

http://www.theemailadmin.com/wp-content/uploads/2012/09/GFI229-hot-water-migration.jpg

Potential Solution #1On-The-FlyAll the nodes update to the latest code in a short time window

• Details: 1. Cookbooks include update (instead of install) directives.2. Control upstream package point (e.g. apt-update when appropriate)3. Force chef-client run4. Now at new level

• Considerations• Pros: Potentially fast, continuous operation• Cons: Don't mess up, it is your production environment• Scope: Security updates• Code Assumptions: • System can function through service restarts.• Underlying data models don't change or migrate appropriately.

Potential Solution #2Split/Migrate/ReplaceNodes migrate in staged groups

• Details: 1. Choose subset of machines and quiesce them.2. Update set3. Freeze state (by tenant)4. Migrate service/tenant content5. Repurpose after complete.

• Considerations• Pros: Safer, more controlled, and can move tenants as

needed• Cons: Takes longer, still has cut-over point, but less open

http://allgodscrittersgotrhythm.blogspot.com/2010_08_01_archive.html

Potential Solution #3Rolling Upgrade

Nodes changed individually by a system-wide orchestration that supports components of multiple versions

• Details1. Components must be able to straddle versions2. Orchestration updates core components to new version3. System as a whole queiseces and is validated (requires

self test)4. Orchestration individually migrates components (return

to step 3)

• Considerations• Pros: Creates a highly resilient system that handles higher

rate of change• Cons: More complex to create and maintain

http://www.grizzlycentral.com/forum/grizzly-tire-wheel-combos/1204-upgrade-tires-grizzly.html

Making this work requires• Orchestration (not just deployment automation)• Awareness of physical layout is required• Must respect fault zones to sustain HA• Proximity of resources matters for migration• Networking transitions are essential

• Collaboration with development teams is essential• Components must support current and previous • Upgrade plan must be baked into configuration and tested• Upgrade dependencies must be 1) clear and 2) minimized

• HA complicates upgrades• Upgrade can be detected as a failure • HA system must be able to bridge versions

Want to Play Along?• Deployment Upstream Cookbooks/Modules• Best Practice Discussions• Code for Upgradeability

• Crowbar Collaboration• Upgrade is a FEATURE!• Orchestration + Chef• Pull from Source Deployments• System Discovery• Networking Configuration• Operating System Install

http://farm3.static.flickr.com/2561/3891653055_262410bc31.jpg

getting from folsom to grizzly - a devops upgrade pattern.pptx

Documents