devopsdays austin: helping horses become unicorns, chef's operations maturity model

37

Click here to load reader

Upload: matt-ray

Post on 27-Aug-2014

3.031 views

Category:

Software


13 download

DESCRIPTION

Helping customers evaluate their ability to deploy and operate systems while managing incidents is key to our Consulting practice. We have developed an operations maturity model that provides a roadmap for understanding and improving mean time to production while setting realistic expectations. This session will explain the challenges and thresholds for becoming a more effective organization.

TRANSCRIPT

Page 1: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model
Page 2: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Chef’s Operations Maturity Model: Helping Horses Become UnicornsMatt Ray DevopsDays Austin May 5, 2014

Page 3: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Introductions• Matt Ray

• Director Partner Integration at Chef

[email protected]

• mattray GitHub|IRC|Twitter

Page 4: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

“There’s nothing horses hate more than hearing stories about unicorns.”John Arbuckle Chief Architect at GE Capital "Hunting the DevOps Whale in Large Enterprises" ChefConf 2014

Page 5: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

http://pichost.me/1468004/

DevOps Unicorns• Etsy

• Facebook

• Netflix

Page 6: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

https://keepinghouseandhorse.files.wordpress.com/2013/10/photoshop3.jpeg

But… Enterprise• Our applications are too complex

• Politics get in the way

• We’ve always done it this way

Page 7: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

It’s Not Magic• Not everyone requires Continuous Delivery

• They require:

• Higher reliability

• Greater visibility

• More resilience

• Faster response

Page 8: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

https://img0.etsystatic.com/000/0/5209298/il_fullxfull.282855902.jpg

How Do We Get There?

Page 9: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

The Map is not the Territory• Comparative study of Operational

Maturity Models

• On one end: ad-hoc, slow to respond, “traditional” approach

• At the other: very fast, fully automated, and disaster indifferent

• Figure out what is most important to your Organization

https://www.chimacumtack.com/images/measurehorse.jpg

Page 10: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Fitting the Model• Varying degrees of adoption

• Operational trends often correlated and relational, but not definitive

• Roadmap for improving time to deployment and lower time to recovery

• Understand the challenges, set real expectations for progress

http://www.web3dservice.com/3d_models/images/unicorn_3d_model_03.jpg

Page 11: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Roadmap Considerations• Hardware Management

• OS Management

• Infrastructure Management

• Software Deployments

• Incident Management

• Disaster Recovery

http://cultofunicorn.com/wp-content/uploads/2013/05/Unicorn_horse.jpg

Page 12: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Hardware Management

Page 13: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Every Server is Sacred!• HA Support expected across the entire stack

• Dependence on vendor/on-site SE for replacement/maintenance

• “This is the best hardware money can buy!”

• Architecture Review and Request Forms for all changes

• “Tier 1” data centers

• Every project special snowflake

Page 14: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

1 SysAdmin to 25-250 systems?

Automate Common Tasks

Page 15: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Maybe not ALL servers are sacred…• Start using some farms of standardized machines

• Fewer support contracts, less dependence on vendor/on-site support

• Architecture Reviews for new services with some implementation standardization

• HA support across most of the stack

• Probably still using “Tier 1” data centers with excess redundancy

Page 16: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

1 Systems Engineer to 250-500 systems

Configuration Management

Page 17: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Most of these servers aren’t sacred?• Limited support on ALL systems

• On-site support used sparingly, lower-skill onsite staff for “normal” failures

• Architecture Reviews only manage exceptions. Automated requests may be exposed via emerging APIs

• Wide adoption of virtualization: server instances are commoditized

• Hardware becoming standardized and easy to replace

• Smaller, more efficient data centers.

• Limited redundancy with hot/hot/hot N+1/N HA strategies

Page 18: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Application Management

1 Systems Engineer to 500-1000 Systems

Page 19: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

None of the servers are sacred• Infrastructure as a Service

• Hardware (if any) is fully commoditized

• Hardware is completely standardized, special cases are regarded as a risk to business

• Redundant Array of Inexpensive Data centers

Page 20: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

1 Site Reliability Engineer to 1000+ Systems

Continuous Delivery

Page 21: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

1 Site Reliability Engineer to 1000+ Systems

Continuous Delivery

Page 22: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Operating System Management

Page 23: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Operating Systems Management• Many OS flavors and versions. Manual, irregular patching

• Limited flavors and versions, planned upgrades. “Patch Tuesday!”

• Standard versions using JEOS with regular upgrades. Automated patching.

• Internally maintained versions, constant upgrades

Page 24: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

http://www.smallwebs.com/Swords/images/UK1796HC2d/SCOTLANDFOREVER2.jpg

Incident Management

Page 25: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Incident Threshold: Recovery Time• Which teams have regular on call responsibilities?

• What is expected of someone on call?

• How are people notified & engaged on an incident?

Page 26: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Incident Threshold: Recovery Time• "Something is wrong!" 12+ hours

• "Something is wrong with the…!" 1-12 hours

• "Something went wrong with your deployment!” <60 minutes

• "The core infrastructure fabric is down!” seconds - 10 minutes

Page 27: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Postmortems

http://photography.nationalgeographic.com/photography/photo-of-the-day/

Page 28: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Postmortems• Postmortem Focus

• Root Cause Orientation

• Root Cause Mitigation/Resolution

• Root Cause Elimination Rate

http://img3.wikia.nocookie.net/__cb20111008164412/mlpfanart/images/thumb/b/b2/Twilight_Sparkle_Angry_by_Ivan-Chan.png/597px-Twilight_Sparkle_Angry_by_Ivan-Chan.png

Page 29: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Postmortems: Ad Hoc• "Human Error”: blame finding & punishment

• "Triggering Event”: blaming specific operator error or specific hardware failures

• Cycle between protecting heroes and then firing them

• <10% - Mostly break fix detection

Page 30: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Postmortems: Formal• Focus on "Triggering Event" or "Human Error", but blaming process and/or infrastructure

• "Let's implement more process and overhead”

• 10% within 3 months - mostly simple fixes

• Tracking but little progress against goals vs. other priorities, frequent recurrence

Page 31: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Postmortems: Officially "Blame Free"• Primary focus on on underlying technical root causes, systemic fixes

• Improved tooling, programatic checks, operator tools for special cases. Some focus on building resiliency

• 20% - Easily fixable issues eliminated within 3 months, programs to eliminate larger issues over time

Page 32: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Postmortems: “5 Whys”• Including business and cultural issues

• Primary focus on insights and opportunities from lessons learned

• Increased resiliency and appropriate operator tools, focus on self-healing fixes

• Recurrence becomes infrequent and is a big deal

Page 33: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Navigating the Change• Many more mile markers

• Roadmap to improve your

• Mean Time To Production

• Mean Time to Recovery

Page 34: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Becoming a Unicorn is Possible• Approach the challenges with realistic expectations for your organization

• Always room for improvement

• Culture trumps everything

http://webecoist.momtastic.com/wp-content/uploads/2010/09/unicorns_3x.jpg

Page 35: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Where Can I Download It?bit.ly/Chef-OMM

Page 36: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model

Thanks!Matt Ray [email protected] @mattray !Thanks to George Miranda, Paul Edelhertz & Jesse Robbins

Page 37: DevOpsDays Austin: Helping Horses Become Unicorns, Chef's Operations Maturity Model