a tale of two workflows - chefconf 2014

110
A Tale of Two Workflows Pete Cheslock @petecheslock

Upload: pete-cheslock

Post on 10-May-2015

1.389 views

Category:

Technology


2 download

DESCRIPTION

Watch this talk here: https://www.youtube.com/watch?v=L__8o02od6Q For an example of the code we used in our CI pipeline to make a Chef Environment from a Berksfile.lock - check out this project: https://github.com/petecheslock/berks2env One of the biggest advantages of Chef is it's flexibility, allowing you to customize it at-will to fit your infrastructure needs. While this makes Chef incredibly powerful, it can also be challenging to develop a workflow to manage the day-to-day usage of chef. Should I use a single repo for all my cookbooks? One cookbook per repo? Berkshelf? Librarian? Test-Kitchen? Where does Jenkins(CI) fit it? What about Testing? How does this work with my small team? What about my large team? What about my * Distributed Team? Over the past few years I have been a part of two distinct Chef workflows that take opposite paths about how to solve issues around collaboration, versioning, testing, etc. During the course of this talk I will share: Details about the requirements that lead us down these 2 paths. What worked. What didn't. How we use many of the tools available to safely test code changes. How we deploy cookbook changes safely and quickly (and keep uptime our highest priority).

TRANSCRIPT

Page 1: A Tale of Two Workflows - ChefConf 2014

A Tale of Two Workflows

Pete Cheslock @petecheslock

Page 2: A Tale of Two Workflows - ChefConf 2014

Age of Wisdom?

Page 3: A Tale of Two Workflows - ChefConf 2014

Age of Foolishness?

Page 4: A Tale of Two Workflows - ChefConf 2014

Who Am I?

Pete Cheslock

Currently - Rabble Rouser at Dyn

!

Previously at Sonian - one of the very early Opscode Chef™ Customers (probably?). Also Sensu.

Page 5: A Tale of Two Workflows - ChefConf 2014

Disclaimer

WARNING: THIS TALK FEATURES TWO CRAZY ASS WAYS YOU CAN USE CHEF AND IS INTENDED FOR A MATURE AUDIENCE. PETE CHESLOCK DOES NOT CONDONE THE WORKFLOWS USED AND DISCOURAGES ANYONE FROM ATTEMPTING THEM.

Page 6: A Tale of Two Workflows - ChefConf 2014

Disclaimer

WARNING: THIS TALK FEATURES TWO CRAZY ASS WAYS YOU CAN USE CHEF AND IS INTENDED FOR A MATURE AUDIENCE. PETE CHESLOCK DOES NOT CONDONE THE WORKFLOWS USED AND DISCOURAGES ANYONE FROM ATTEMPTING THEM.

THIS TALK MAY ANGER YOU - I’M HERE IF YOU NEED A HUG AFTERWARDS

Page 7: A Tale of Two Workflows - ChefConf 2014

Double Disclaimer

For the love of all that is DevOps..

Page 8: A Tale of Two Workflows - ChefConf 2014

Double Disclaimer

For the love of all that is DevOps..

Please don’t Cargo Cult this.

Page 9: A Tale of Two Workflows - ChefConf 2014

What do you do here?I’m a people person - I swear.

Page 10: A Tale of Two Workflows - ChefConf 2014
Page 11: A Tale of Two Workflows - ChefConf 2014

Biases rule everything around me

Page 12: A Tale of Two Workflows - ChefConf 2014

Chef

The cause of... and solution to... all of life's problems.

Page 13: A Tale of Two Workflows - ChefConf 2014

Environments

Databags

Roles are good

Roles are bad

WTF is a Berkshelf ?

Librarian?

Chef Server

Chef Zero

Vagrant-Berkswhat?Hosted Chef

LWRPsDon’t Use Definitions!

Definitions are Awesome!

Page 14: A Tale of Two Workflows - ChefConf 2014

Pick Your Poison

Page 15: A Tale of Two Workflows - ChefConf 2014

SonianFounded 2008

2008 AWS Startup Challenge Finalist

I joined in 2009

Very early Chef user - Originally with Puppet (before Opscode existed)

Pre-Databags, Roles, etc, etc.

Massive growth in short time - reaching 100’s of TB’s of ElasticSearch and well over a PB of S3 Storage.

Page 16: A Tale of Two Workflows - ChefConf 2014

https://github.com/opscode/chef-repo

.chef/knife.rb

cookbooks

data_bags

environments

roles

Page 17: A Tale of Two Workflows - ChefConf 2014
Page 18: A Tale of Two Workflows - ChefConf 2014
Page 19: A Tale of Two Workflows - ChefConf 2014

Soon - business started to pick up - very quickly.

Speed picked up, things moved fast and we broke stuff

Page 20: A Tale of Two Workflows - ChefConf 2014

Soon - business started to pick up - very quickly.

Speed picked up, things moved fast and we broke stuff

To close some deals we had contracts signed that would limit when we could push changes to the systems.

Page 21: A Tale of Two Workflows - ChefConf 2014

Customer A: HEAD

sonian/chef-repo:master

Customer B: fd50a5c

Customer C: sonian/chef-repo:tag-v0.1.1

a1add77

Page 22: A Tale of Two Workflows - ChefConf 2014

Customer A: HEAD

sonian/chef-repo:master

Customer B: fd50a5c

Customer C: sonian/chef-repo:tag-v0.1.1

HEAD

Page 23: A Tale of Two Workflows - ChefConf 2014
Page 24: A Tale of Two Workflows - ChefConf 2014

Now imagine that scenario with 20 environments - Each environment living either on AWS, Rackspace Cloud, HP Cloud or IBM “SmartCloud”

Each environment has a different contracted deployment schedule.

I know what you are thinking - system changes aren’t a “deploy” - well next time I’ll bring you to meet with the lawyers on that.

Page 25: A Tale of Two Workflows - ChefConf 2014

How did this work in practice?

In the past we’d push a small change to Prod - everything would break terribly. Lots of technical debt - scenarios that no one could ever believe could happen

This is email archiving - in some cases customers would have mail forwarded to us via their mail server. We CAN NOT drop that mail. If they are audited and we are proven to be missing data - that is really, really bad. Srs super bad.

Page 26: A Tale of Two Workflows - ChefConf 2014

We liked our single Chef-repo

Every Story had Branch- and we got into the cycle of commit, merge, push and test

Represented our pre-prod environments as branches in git - using some internal tooling to manage.

Page 27: A Tale of Two Workflows - ChefConf 2014

eng-9999HEAD (master)

QA (Daily)

Dev (Daily)

Cut a new branch from Master

Developer adds commits and test

locally

Developer merges to dev branch for dev

testing

If things “work” and nothing breaks -

merge to QA

If it passes regression testing - merge into master

(with others)

Page 28: A Tale of Two Workflows - ChefConf 2014

• roles/stack.rb • base.rb • nonprod.rb • cloud.rb (ec2, rackspace)

• roles/application.rb • application.rb • service.rb • etc.rb

“Hold on a minute. I’m just going to push this small

change to this one role.”

It’s roles all the way down

Page 29: A Tale of Two Workflows - ChefConf 2014

We got burned all the time.

“Move Fast and Break Everything”

Needed something that worked for today & the future

Let’s create a Git branching strategy!

Page 30: A Tale of Two Workflows - ChefConf 2014

Wut?

I know.

Seriously. I know.

We were trying to answer this one question.

“How do you version the cookbooks, roles, and databags as one singular asset.”

Page 31: A Tale of Two Workflows - ChefConf 2014

release/2011-08-01

release/2011-07-01

master (HEAD)

Page 32: A Tale of Two Workflows - ChefConf 2014

release/2011-08-01

base/2011-07-01

release/2011-07-01

master (HEAD)

Cut a new branch for the release

At the same time create a base/

release tag.

Page 33: A Tale of Two Workflows - ChefConf 2014

release/2011-08-01

base/2011-07-01

release/2011-07-01

master (HEAD)

Cut a new branch for the release

At the same time create a base/

release tag.

QA

New code constantly hitting master

Page 34: A Tale of Two Workflows - ChefConf 2014

release/2011-08-01 eng-9999

base/2011-07-01

release/2011-07-01

master (HEAD)

Cut a new branch for the release

At the same time create a base/

release tag.

QA

New code constantly hitting master

Checkout a branch from the Base Tag

Merge code into Release branch

Merge into master if you want it to advance

Page 35: A Tale of Two Workflows - ChefConf 2014

base/2011-08-01

release/2011-08-01 eng-9999

base/2011-07-01

release/2011-07-01

master (HEAD)

Cut a new branch for the release

At the same time create a base/

release tag.

Page 36: A Tale of Two Workflows - ChefConf 2014

base/2011-08-01

release/2011-08-01 eng-9999

base/2011-07-01

release/2011-07-01

master (HEAD)

Make individual commits and Cherry-pick forward

Cut a new branch for the release

At the same time create a base/

release tag.

Page 37: A Tale of Two Workflows - ChefConf 2014

base/2011-08-01

release/2011-08-01 eng-9999

base/2011-07-01

release/2011-07-01

master (HEAD)

Cut a new branch for the release

At the same time create a base/

release tag.

Rebase & Squash commits branches

Backwards

Page 38: A Tale of Two Workflows - ChefConf 2014

That sounds overly complex

We has some git experts - and it leveled up all our game.

Extensive tooling around our branching strategy.

We were Release Engineering.

Page 39: A Tale of Two Workflows - ChefConf 2014

https://github.com/sniperd/mise-en-place

Page 40: A Tale of Two Workflows - ChefConf 2014
Page 41: A Tale of Two Workflows - ChefConf 2014

So What Happened?

It actually worked.

Not only that - it really worked well.

20+ Stacks, upgrading 4 per night (6pm to 12pm if you are lucky)

Before “Deploy Week” - we deployed all the time - and things broke all the time.

Page 42: A Tale of Two Workflows - ChefConf 2014

Over the course of about 12 months we went from:

Deploy whenever - things break randomly (little testing)

Create a multi-page deploy checklist of mostly manual items

“Deploy Week” - 20 Stacks over 5 days (6pm to 12am - hopefully)

“Deploy Day” - 20 Stack over one night - 6pm to 9pm

“Deploy Day” - Saturday (contracts) - Best time was 20+ stacks ~1 hour

Page 43: A Tale of Two Workflows - ChefConf 2014

Deploys were drama free

They were drama free because we tested all the pieces that changes together. And not just unit and integration testing, but full on regression testing and user acceptance testing.

DataBags, Roles, Cookbooks, Application Code - It all moved together.

Tooling was built to support the support team (who eventually did the deploys)

High communication and tight teamwork allowed this to work.

Page 44: A Tale of Two Workflows - ChefConf 2014
Page 45: A Tale of Two Workflows - ChefConf 2014

“If I could do it all over again I would do it very differently”

Page 46: A Tale of Two Workflows - ChefConf 2014

Dyn Incorporated in 2001, Dyn’s global presence services more than four million enterprise, small business and personal customers.

We specialize in Traffic Management & Message Management

I joined early in 2013 to run the System Automation and Release Engineering Team

(We call it DevTools)

Page 47: A Tale of Two Workflows - ChefConf 2014
Page 48: A Tale of Two Workflows - ChefConf 2014

There is always technical debt in the banana stand

Page 49: A Tale of Two Workflows - ChefConf 2014

ChefCFEngine

PuppetNIH

Page 50: A Tale of Two Workflows - ChefConf 2014
Page 51: A Tale of Two Workflows - ChefConf 2014
Page 52: A Tale of Two Workflows - ChefConf 2014
Page 53: A Tale of Two Workflows - ChefConf 2014

Develop a pipeline that allows for simple usage by plugging it into a CI system for automated testing and deployment.

Page 54: A Tale of Two Workflows - ChefConf 2014

!

But the hardest challenge is that change is dangerous. It’s even more frightening when you have a MASSIVE chunk of the internet depending on you to stay running ALL THE TIME.

Page 55: A Tale of Two Workflows - ChefConf 2014

Do it w/o taking down the internet

If we don’t build in the necessary gates and levers to allow for lots of testing and controlled deploy options out to our edge systems, bad things can happen.

Page 56: A Tale of Two Workflows - ChefConf 2014

Scope of bad

Page 57: A Tale of Two Workflows - ChefConf 2014

Scope of bad

Page 58: A Tale of Two Workflows - ChefConf 2014

Scope of bad

Page 59: A Tale of Two Workflows - ChefConf 2014

Scope of bad

Page 60: A Tale of Two Workflows - ChefConf 2014

Scope of bad

Page 61: A Tale of Two Workflows - ChefConf 2014

Scope of bad

Page 62: A Tale of Two Workflows - ChefConf 2014

Initial Challenges

We have lots of FreeBSD

Change is hard - especially to unknown systems.

We really wanted to deploy a solution that was going to bring in Zero Dependencies.

Page 63: A Tale of Two Workflows - ChefConf 2014
Page 64: A Tale of Two Workflows - ChefConf 2014
Page 65: A Tale of Two Workflows - ChefConf 2014

I heard you like FreeBSD…

Page 66: A Tale of Two Workflows - ChefConf 2014

Now that FreeBSD problem is solved - we were able to start deploying Chef out to all our nodes.

We created a role[base] - which includes a run list of items of things we wanted in place.

About a month later or so - we wanted to push a change to that role - at the same time it was linked to some specific cookbook versions.

Page 67: A Tale of Two Workflows - ChefConf 2014

So basically we wanted a versioned run list - but we also want to set and override some attributes also.

So we decided to move our roles (since we were not using them much yet) and just focus on using wrapper recipes.

The bonus here is that any person can just clone a cookbook - and run Test-Kitchen & Serverspec on that “role” to get a node just like it. No dealing with roles from other cookbooks.

Page 68: A Tale of Two Workflows - ChefConf 2014

Roles vs. No Roles

Page 69: A Tale of Two Workflows - ChefConf 2014

The wrapper recipe idea made sense to us because we wanted to make sure that when we used community cookbooks - we never edited them. So for example we have a dyn_ci recipe which wraps the functionality inside of the Jenkins recipe.

When Jenkins updates from 1.0 to 2.0 - we simply update and refactor our wrapper cookbook and set the version constraint in the metadata as appropriate.

Page 70: A Tale of Two Workflows - ChefConf 2014

Circular Dependency

Page 71: A Tale of Two Workflows - ChefConf 2014

We use the default chef-full template and it has a section that looks like this:

Page 72: A Tale of Two Workflows - ChefConf 2014

!

Where are most community cookbooks stored? github.com & community.opscode.com. Who does their DNS? You see where we are going.

So - we created a new organization on our Enterprise Chef Server - called the cookbook repo, where we stored community cookbooks we used.

Page 73: A Tale of Two Workflows - ChefConf 2014

Later we moved those to Github Enterprise locally for 2 reasons.

1. It allowed anyone to easily see which cookbooks we already had locally.

2.It allowed us to run short time forks of those cookbooks while we pushed the changes upstream to the owner. (and for people to see those changes.

Page 74: A Tale of Two Workflows - ChefConf 2014

Remove the humans from the equation

!

Foodcritic, chefspec, rubocop, serverspec

thor-scmversion to automate versioning and git tagging.

Page 75: A Tale of Two Workflows - ChefConf 2014

Run will execute - if the tests pass - thor will version based on #patch, #minor, #major

Page 76: A Tale of Two Workflows - ChefConf 2014

So we try to speed up the iteration to master

So - now the development cycle looks like

User cuts a branch - makes changes - runs tests locally (we hope) - then submits a pull request.

Jenkins tests the PR - if good - report back to GH:E with Green.

When merged - Jenkins runs the tests again - if they pass then Jenkins will tag the release and upload it to the cookbookrepo.

Page 77: A Tale of Two Workflows - ChefConf 2014

Development Deployment

Page 78: A Tale of Two Workflows - ChefConf 2014

How has this worked?

We are the product owner

On-Demand support internally

Training

Mentoring

Page 79: A Tale of Two Workflows - ChefConf 2014

All new apps come with cookbooks.

They even come with tests. (Yay!)

Test Kitchen and Berkshelf for our local development and deploy

Page 80: A Tale of Two Workflows - ChefConf 2014

github.com/dyninc/cookbookapi

So we built our own cookbook api to use (with Berks 2) that let us use our own site with our own cookbooks (and the community cookbooks in our site repo)

Page 81: A Tale of Two Workflows - ChefConf 2014

So how do you get it to production?

So - the requirements were such that we wanted a few thing

Easily be able to deploy to a single node in a site

Easily be able to deploy to a single node in many sites

Easily be able to deploy to a single node in every site

Easily be able to deploy to a single node in a region

Easily be able to deploy to a single node in many sites

…… you get the point. EVERY POSSIBLE DEPLOY SCENARIO.

Page 82: A Tale of Two Workflows - ChefConf 2014
Page 83: A Tale of Two Workflows - ChefConf 2014

Represent state of chef org in Git

Act as single source of truth

Have Jenkins manage the upload of those cookbooks to prod

Ensure the environment locks those cookbooks explictly

Page 84: A Tale of Two Workflows - ChefConf 2014

So, i already told you we didn’t use roles because we really wanted to be able to version the run list (many people other than us could be touching that).

We have thor-scmversion auto bumping the versions of cookbooks (and freezing on upload to the package server) As one does.

We knew that when we ran node in production - we want it in an environment with very specific cookbook version locks.

And we wanted those environment to be immutable. Created and uploaded in an automated way.

Page 85: A Tale of Two Workflows - ChefConf 2014
Page 86: A Tale of Two Workflows - ChefConf 2014

We’ve been using thor-scm for versioning our cookbooks - why not our servers too?

Page 87: A Tale of Two Workflows - ChefConf 2014
Page 88: A Tale of Two Workflows - ChefConf 2014
Page 89: A Tale of Two Workflows - ChefConf 2014

1_5_LATEST1_5_0

1_4_123

1_4_LATEST1_4_1251_4_124-alpha_1

app-2

app-11_4_LATEST

Virtual Real

=

Page 90: A Tale of Two Workflows - ChefConf 2014

1_5_LATEST1_5_0

1_4_123

1_4_LATEST1_4_125

1_4_124-alpha_1

app-2

app-11_4_LATEST

6ead49d Deploy dyn_myface v1.0.3

Virtual Real

=

Page 91: A Tale of Two Workflows - ChefConf 2014

1_5_LATEST1_5_0

1_4_123

1_4_LATEST1_4_125

1_4_124-alpha_1

app-2

app-11_4_LATEST

6ead49d Deploy dyn_myface v1.0.3

Virtual Real

=

Page 92: A Tale of Two Workflows - ChefConf 2014

1_5_LATEST1_5_0

1_4_123

1_4_LATEST 1_4_125

1_4_124-alpha_1

app-2

app-11_4_LATEST

6ead49d Deploy dyn_myface v1.0.3d6b0b7e Deploy dyn_myface v1.0.3 to all #patch

Virtual Real

=

Page 93: A Tale of Two Workflows - ChefConf 2014

1_5_LATEST1_5_0

1_4_123

1_4_LATEST 1_4_125

1_4_124-alpha_1

app-2

app-11_4_LATEST

6ead49d Deploy dyn_myface v1.0.3d6b0b7e Deploy dyn_myface v1.0.3 to all #patch

Virtual Real

=

Page 94: A Tale of Two Workflows - ChefConf 2014

1_5_LATEST 1_5_0

1_4_123

1_4_LATEST 1_4_125

1_4_124-alpha_1

app-2

app-11_4_LATEST

=

6ead49d Deploy dyn_myface v1.0.3d6b0b7e Deploy dyn_myface v1.0.3 to all #patch

Virtual Real

7db580b Deploy dyn_myface v2.0.0 #minor

=

Page 95: A Tale of Two Workflows - ChefConf 2014

1_5_LATEST 1_5_0

1_4_123

1_4_LATEST 1_4_125

1_4_124-alpha_1

app-2

app-11_4_LATEST

=

6ead49d Deploy dyn_myface v1.0.3d6b0b7e Deploy dyn_myface v1.0.3 to all #patch

Virtual Real

7db580b Deploy dyn_myface v2.0.0 #minor

=

Page 96: A Tale of Two Workflows - ChefConf 2014

Limited allow list for deploy

Anyone can propose a change to production - but the ops team will approve those changes. (for #patch or greater that is)

The same workflow applies to pre-release environments.

Page 97: A Tale of Two Workflows - ChefConf 2014

Databags?

Since we version all of our cookbooks using Thor-scmversion

And we do the same with chef environments.

And we need lots of flexibility with our code deployment process due to the nature of the system

We built a tool that allows us to version our databags for deploy. https://github.com/Vanders/knife-databag-version

Page 98: A Tale of Two Workflows - ChefConf 2014

Version your databags?

Seriously - what is wrong with you?

We use databags pretty sparingly - mostly just encrypted databags for shared secrets and other info.

Our engineers ask us for the flexibility - we build the tools. The tools enable the workflow.

Page 99: A Tale of Two Workflows - ChefConf 2014
Page 100: A Tale of Two Workflows - ChefConf 2014

What’s this all look like?

assume we have a simple data bag item:

Page 101: A Tale of Two Workflows - ChefConf 2014

with knife data bag version this becomes a template:

Page 102: A Tale of Two Workflows - ChefConf 2014

knife data bag version can then create a JSON file using this template:

Page 103: A Tale of Two Workflows - ChefConf 2014

knife data bag version will emit a JSON file:

Page 104: A Tale of Two Workflows - ChefConf 2014

All managed by Jenkins - hands off for the developer

Databags the same as cookbooks - and allow for more flexible deploy options for us.

We still use standard databags - this is just another lever to pull

Page 105: A Tale of Two Workflows - ChefConf 2014

Room for improvement?

#minor and #major

Site to abstract changing cookbook versions.

Upload cookbooks early - control with environment version locks

Page 106: A Tale of Two Workflows - ChefConf 2014
Page 107: A Tale of Two Workflows - ChefConf 2014
Page 108: A Tale of Two Workflows - ChefConf 2014
Page 109: A Tale of Two Workflows - ChefConf 2014

Thank You

Pete Cheslock

[email protected]

@petecheslock

Page 110: A Tale of Two Workflows - ChefConf 2014

Thank You

Pete Cheslock

[email protected]

@petecheslock