devops friday - 12th april 2013

7/28/2019 DevOps Friday - 12th April 2013

1/15

Insight from DevOps Thought Leaders

12t APRIL 2013


2/15

Welcome to Devops FriDay!Each week, we summarise the best content o the week coming out othe DevOps community.

The newsletter is curated by DevOps enthusiast, Benjamin Wootton.

I welcome any eedback or suggestions or content or the newslettervia Twitter @benjaminwootton.

To sign up or uture editions, please visit the DevOps Friday home page.

12th april 2013

DevOps Friday is proudly supported by ServerDensity.

Server Density is a server and website monitoring tool.

Were supporting DevOps Friday in its quest to continually

publish bite size insights and news into the DevOps world.

I you like this, please visit ServerDensity

and the ServerDensity blog.

Curated byBjai Wtt

www.devopsriday.com
https://twitter.com/benjaminwoottonhttp://www.devopsfriday.com/http://www.serverdensity.com/http://blog.serverdensity.com/?kme=Followed+Sponsor+Link&km_email+type=devopsfridayhttp://www.devopsfriday.com/http://www.devopsfriday.com/http://blog.serverdensity.com/?kme=Followed+Sponsor+Link&km_email+type=devopsfridayhttp://www.serverdensity.com/http://www.devopsfriday.com/https://twitter.com/benjaminwootton


3/15

3

5

10

6

12

8

14

groWing an ops teamFrom one FounDerDavid Mytton

mattheW skeltonon soFtWare operabilityMatthew Skelton

the benchmark yourereaDing is probably WrongRethinkDB Team

Why Devops matters(to Developers)Benjamin Wootton

application supportis perFect For DevopsMatt Watson

the state oF the artmonitoring stackSandy Walsh

Unortunately, most benchmarks

published online have crucial aws

in the methodology, and since many

people make decisions based on

this inormation, sotware vendors

are orced to modiy the deault

conguration o their products to

look good on these benchmarks.

Operability is an engineering term

concerning the qualities o a system

which make it work well over its

lietime, and sotware operability

applies these core engineering

principles to sotware systems.

In the early days o 2009, it was

just me running the Server Density

monitoring inrastructure. Over the

last 4 years the service has grown in

terms o team members, data volume,

customers and inrastructure so here

are a ew lessons rom scaling the ops

team and how things are run.

DevOps stems rom the idea that

developers and operations should

work more closely together to

increase the quality o the systems

that we build and operate,

but most o the enthusiasm and

thought leadership appears to

come rom the Operations side

o the ence.

Finally, organizations can embrace

a DevOps approach that improves

application support even i they

dont have a ormal DevOps team.

Stackiy is the only solution that

provides the proper access, tools and

intelligence to improve application

support eciency.

For many o us starting in this area,

our concept o monitors consists

o top, some apache, mysql and

application log les. We were scared

of o monitoring by these old

monolithic products that required

huge licensing ees and armies

o proessional services people.

Thankully, times have changed.

contents


4/15

4

ReLAunchIngdevoPs fRIdAy

ThIs WeekIn devoPs

Around a year ago, I started a small

newsletter summarising the best DevOps

related links o the week.

Since that time, interest has continued

to grow in DevOps. Fantastic blog posts

and articles are coming out each week,

advancing the state o the art. Conerences

are generating massive amounts o

interesting content and discussion.

Discussion on Twitter and on the podcasts

is entertaining and educational.

Considering this, I want to take DevOps

Friday to the next level, using it as a hubto capture and communicate the best

content o each week. I you like this issue,

please consider sharing with riends and

colleagues and encouraging them to sign

up at the DevOps Friday home page.

Keith and Marios Guide to Fast Websites

MongoDB Large Scale Data Centric Applications

Hiring or the DevOps Toolchain: The Need or Generalists

How Badly Set Goals Create a Tug-o-War in Your DevOps Organization

Treating Servers as Cattle, Not as Pets

Achieving Awesomeness with Opscode Che (Part 2)

Making A Point With SLAs

Amazon Cloud A River Runs Through It

Using Message Queues in Cloud Applications

Are You Unknowingly Replicating Your Failure as a DBA?

this Week in Devops
http://www.devopsfriday.com/http://webuild.envato.com/blog/keith-and-marios-guide-to-fast-websites/http://www.infoq.com/presentations/MongoDB-Designhttps://puppetlabs.com/blog/hiring-for-the-devops-toolchain-the-need-for-generalists/http://architects.dzone.com/articles/how-badly-set-goals-create-tug?mz=38541-devopshttp://http//architects.dzone.com/articles/treating-servers-cattle-not?mz=38541-devopshttp://http//www.opscode.com/blog/2013/03/18/achieving-awesomness-with-opscode-chef-part-2/http://blog.serverdensity.com/making-a-point-with-slas/http://devops.rackspace.com/using-message-queues-in-cloud-applications.htmlhttp://devops.rackspace.com/using-message-queues-in-cloud-applications.htmlhttp://devops.rackspace.com/is-mysql-unknowingly-replicating-your-failure-as-a-dba.htmlhttp://devops.rackspace.com/is-mysql-unknowingly-replicating-your-failure-as-a-dba.htmlhttp://devops.rackspace.com/using-message-queues-in-cloud-applications.htmlhttp://devops.rackspace.com/using-message-queues-in-cloud-applications.htmlhttp://blog.serverdensity.com/making-a-point-with-slas/http://http//www.opscode.com/blog/2013/03/18/achieving-awesomness-with-opscode-chef-part-2/http://http//architects.dzone.com/articles/treating-servers-cattle-not?mz=38541-devopshttp://architects.dzone.com/articles/how-badly-set-goals-create-tug?mz=38541-devopshttps://puppetlabs.com/blog/hiring-for-the-devops-toolchain-the-need-for-generalists/http://www.infoq.com/presentations/MongoDB-Designhttp://webuild.envato.com/blog/keith-and-marios-guide-to-fast-websites/http://www.devopsfriday.com/


5/15

5

When DevOps emerged in 2009, thegap between development andoperations teams nally started to getthe kind o media and vendor attention itdeserved. DevOps gets developers moreinvolved in IT operations so they canmore rapidly resolve sotware issues thatarise ater deployment. Without accessto production applications and servers,even development managers and systemadmins need help identiying and solvingproblems, which is horribly inecient.

Some o us have been doing DevOpseven beore it had a name. At my lastcompany, the lead developers were heavily

involved in hardware purchases, settinghardware up, deploying code, monitoringsystems and much more. The problemwas that only three o the 40 developershad production access. The chosen three(including me) spent an inordinate amounto time helping others troubleshoot andx application bugs. While I didnt trustthe junior developers with the keys tothe kingdom I nevertheless would havepreerred them to have the ability to xtheir own bugs. Because our applicationsupport processes werent very ecient,I wasted a lot o my own time xing bugsinstead o building new eatures.

Later, I started Stackiy because I believethat more developers should be involvedin production application support. Thatway, a couple o employees like thethree o us at my old job dont become abottleneck. Meanwhile, junior developers,QA and even less technical support peoplecan get server access to view log les andother basic troubleshooting inormation.Sadly in most companies today, the leaddeveloper or system admin ends up

tracking down a log le or nding someminor bug in another developers app

when they should be working on moreimportant projects.

Developers should be more involved in thedesign and support o the inrastructureour applications rely on since we areultimately responsible or the applicationswe create. We should be able to deploy ourapplications, monitor production systems,ensure everything is working properly andbe held responsible when our applicationsail in production.

Finding and xing bugs is oten more

dicult than it sounds, however. Just thinkor a moment. What do your developersneed access to? I your team is anythinglike mine was they need:

A database o application exceptions

Application and server log les

Windows Event Viewer

Application and server cong les

SQL databases to test queries

Scheduled jobs history

Server monitoring tools

Perormance monitoring tools

and the list goes on.

When a developer is trying to x a bug,nothing is more rustrating than lackingthe details necessary to reproduce or xthe problem. Troubleshooting applicationproblems can require access to a lot oinormation which in turn involves a lot oscreens and a lot o logins. Imagine gettingall the inormation you need in a singlescreen and then having the ability to drilldown into it with a couple o mouse clicks.

As nice as it sounds, giving developersaccess to the inormation they need has

been more dicult than it sounds because:

The data resides in many locations

Too many tools exist to accessdiferent types o inormation

It can be dicult or impossible tocontrol access rights and protectsensitive data

Developers should be preventedrom making changes

It is dicult or impossible to auditwhat developers access

To overcome the challenges outlined inthis post, I and my team at Stackiy builta solution that gives developers access toall the inormation they need to provideefective application support. It alsosolves the problems that have preventedsuch inormation sharing in the past.With Stackiy, you can eliminatebottlenecks in development teams andscale application support teams withoutadditional head count.

Finally, organizations can embracea DevOps approach that improvesapplication support even i they donthave a ormal DevOps team. Stackiy isthe only solution that provides the properaccess, tools and intelligence to improveapplication support eciency.

application support

is perFect For DevopsMatt Watson oundedStackiy in 2012 and as CEO provides the vision and leadershipor the direction o the company. Matts goal is to simpliy IT operations via Stackiys

DevOps solution. Prior to ounding Stackiy, he was the ounder and CTO o VinSolutions.

Matt is an entrepreneur at heart and excels and product and sotware development.

Matt Watson

@stackiy
http://www.stackify.com/defining-the-ops-in-devops/http://www.stackify.com/server-monitoring/http://www.stackify.com/server-monitoring/http://www.stackify.com/developers-production-access-applications-support-stackify-2/http://www.stackify.com/access-log-files/http://www.stackify.com/server-monitoring/http://www.stackify.com/application-support/http://www.stackify.com/http://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/Stackifyhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/Stackifyhttp://www.stackify.com/http://www.stackify.com/application-support/http://www.stackify.com/server-monitoring/http://www.stackify.com/access-log-files/http://www.stackify.com/developers-production-access-applications-support-stackify-2/http://www.stackify.com/server-monitoring/http://www.stackify.com/server-monitoring/http://www.stackify.com/defining-the-ops-in-devops/


6/15

6

I

n the early days o 2009, it was just me

running the Server Density monitoring

inrastructure. The service came out o

beta in the summer and immediately

had a ew paying customers which

helped to und the rental o a couple

o slices rom Slicehost (ancy VPSs).

The volume o traic, simplicity o the

service components and small number

o servers meant that there were ew

problems.

Over the last 4 years the service has

grown in terms o team members, data

volume, customers and inrastructure sohere are a ew lessons rom scaling the

ops team and how things are run.

BooTsTRAPPIngofTen meAns LeAvIngThIngs To LAsT mInuTe

Ideally youll anticipate problems and have

a solution well in advance, but thats not

always possible. The most likely reason in

the early days is cash; or lack o it.

In August o 2009 Id just completed our

migration rom MySQL to MongoDB

and it still had problems with eagerly

eating up disk space. This prompted

setting up a new server with increased

disk space because resizing a Slicehost

instance wouldve meant some hours o

downtime. It went down to the very last

ew bytes o remaining disk space as the

sync completed.

IT ALso meAns TRyIng

To fInd The quIckesT WAyTo do ThIngs

Time is something you dont have much o

and one o the slowest things is transerring

large quantities o data over the internet.

We had an unexpected ailure where we

had to do a ull resync o a MongoDB slave

in a diferent data centre, which wouldve

taken 6 days. Instead, we copied the data

onto a USB disk drive and had UPS ship it to

the other acility. Network transer speeds

worked out at around 5MB/s whereas UPS

delivered at 11MB/s.

LeT oTheR PeoPLe heLP

You really need at least one other person

to be able to take on-call duties when

youre away but i thats not possible or as

a backup, you could make use o services

provided by your hosting company or a

third party.

We quickly moved rom Slicehost to

managed servers at Rackspace and

they were able to do monitoring andrespond to issues like servers down or

services not running. They took special

instructions or dierent scenarios and

you could always phone them and

ask them to perorm certain actions. I

remember several instances where I was

away rom my computer and was able to

phone Rackspace support, asking them

to perorm some basic recovery actions

whilst I got back online.

consIdeR suPPoRT

conTRAcTsIn addition to general sysadmin support

rom your hosting provider, you can buy

commercial support contracts or the

sotware products youre using. This could

be Ubuntu Linux, Nginx or MongoDB.

Depending on the level o support you can

get some pretty involved help when you

need it most.

However, theyre oten very expensive and

unafordable as a startup. Even with the

greater resources we now have, supportcontracts are aimed at enterprises with big

budgets. One way to workaround this is to

be very involved with the projects you use.

I was an early adopter o MongoDB and

have a close relationship with 10gen, the

company behind it, so am able to get good

deals on support.

Also consider what support you really

need. Our support contract with MongoDB

was well used in the early days because it

was a new technology. Its signicantly

more stable nowadays and other products,like Apache or example, weve never had

an issue with.

fIguRe ouT WhAT youhAve To do And WhATcAn Be ouTsouRced

I consider keeping core engineering in-

house very important or technology/

sotware companies but there are lots o

David Mytton

@serverdensity

David Mytton is the ounder oServer Density. He has been programming in PHP and

Python or over 10 years, regularly speaks about MongoDB (including running the

London MongoDB User Group), co-ounded the Open Rights Group and can oten be

ound cycling in London or drinking tea in Japan.

groWing an ops team

From one FounDer
http://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/serverdensityhttp://www.serverdensity.com/http://www.serverdensity.com/http://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/serverdensity


7/15

7

things that need doing to run operations

that could be outsourced to (trusted)

individuals on an ad-hoc basis. Engineers

are terrible at valuing their own time and

oten use the argument: why pay or

something I could build/install/congure

mysel?. Candidates or this are things

like running through PCI compliance

checklists, setting up centralised logging,reorganising servers (e.g upgrading

base OSs), researching CDN providers,

integrating CI tools, etc. You always want

someone technical managing the project

to keep things on track and validate the

end results, but these are things you dont

need to do yoursel.

hAck TRAveLIng

As part o the ounding team and even

as an engineer youre likely to have to

travel at some point to conerences,meeting customers, pitching vendors

or maybe on holiday! Its relaxing to

be uncontactable on the plane but its

also scary because you have no idea i

everything is still running. On one o my

trips to Japan, as soon as I stepped of the

12 hour ight to Tokyo Narita, I had a ood

o SMS alerts as one o our MongoDB

servers had encountered a problem 4

hours previously. One o our engineers

had been assigned on call or my ight

and had already worked with the guys at

10gen and resolved the problem.

Youll realise you become a slave to

connectivity so trips to Japan are ne, but

Tajikistan isnt really an option. So you

need to be able to get internet access

and power anywhere you are tricks such

as visiting Starbucks, carrying external

hotspots and not running things like

updates when youre away!

donT foRgeT

The humAn AsPecT

There are a lot o cool tools which

help to automate processes, and these

should be used as much as possible.

However, its still real people running things

in the end. This is the really dicult bit o

having a small team because everyone has

to pitch in and it can be dicult to share

the workload when just a ew people know

how things work.

You have to consider who will take the

call when things break:

How quickly can they get to a

computer they can use to x things?

Are you out drinking on a Friday?

What happens i someone alls

ill? This could be a minor cold ormajor emergency. This could be the

individual engineer or their amily

members. Does the on-call have

enough phone battery? Can they hear

their ringtone?

Who is backup i the primary doesnt

pick up? This is especially the case

with outages. They oten happen at

inconvenient times and big incidents

might require you to work or

signicant periods o time.

Dealing with communicating with

customers, xing problems and recovering

data can be exhausting especially when

theres nobody else to help. The ultimate

goal is to build your team so that shit

based on-call cover can be provided but

its dicult in the beginning with limited

resources (both or people and multi-

geographic redundancy). Nobody is an

invested in your service as you and your

team Although services like Rackspaces

support are helpul in certain situations,theyre never able to know the ull story

behind your service and how to deal

with complex components. For example,

MongoDB was a completely new database

and didnt have single server durability

or some time a bad shutdown could

require a lengthy database repair, which

was important to take steps to avoid such

as by properly shutting it down beore

powering of the server.

Knowing about the weaknesses and

how to deal with them is somethingthat requires greater knowledge o your

setup that basic vendor support isnt

going to provide. These things should be

a stopgap or supplement the end goal

o growing your own team. The whole

point o devops is that its a mixture o

engineering and operations so you dont

need to hire dedicated sysadmins. This

works well or small startup teams but

you will eventually want someone (or

multiple people) who are responsible

or the day to day operations. Engineers

still engage with the team, can deploy,

work on testing and debug problems but

things like dealing with a ailed disk drive

or implementing backups is really outside

the remit o devops in a large team.

You know youre there when you

can start hiring site reliability

engineers!


8/15

8

mattheW skelton

on soFtWare operability

WhAT Is sofTWARe

oPeRABILITy And WhyIs IT ImPoRTAnT?

Operability is an engineering term

concerning the qualities o a system

which make it work well over its lietime,

and sotware operability applies these

core engineering principles to sotware

systems. An operable sotware system is

one which delivers not only reliable end-

user unctionality, but also works well

rom the perspective o the operations

team. Such sotware has been built to

operate successully without needing

application restarts, server reboots, load-

balancer hacks, or any o the countless

other xes and work-arounds which

operations teams have to use in order to

make many business sotware systems

work in practice on a daily basis. Sotware

systems which ollow sotware operability

good practice will tend to be simpler to

operate and maintain, with a reduced cost

o ownership, and almost certainly ewer

operational problems.

WheRe dId youRInTeResT In oPeRABILITycome fRom?

Early in my career I built sotware systems

or MRI (brain) scanners and oil & gas

exploration. Operability or such systems

is essential; its no use building an MRI

scanner which can produce 3D brain

images i it needs rebooting ater taking

every second image. Likewise, it was

cheaper to drill a new oil well than to

extract a aulty down-hole pressure gauge;

these systems had to operate reliably with

minimal human intervention. Since then I

have too oten seen the negative efects

o operational eatures being dropped

beore go-live, which usually results in

signicant operational costs and more

incidents in Production. There is no good

reason in 2013 why businesses should

put up with (and pay or) second-rate

sotware which needs arduous human

attention every ew hours or days just in

order to maintain normal operation. In

my experience, most modern businesssotware is simple enough (at a systems

level at least) that we can signicantly

reduce operational cost and downtime

by introducing sotware operability as a

key concern or sotware product delivery

teams. Ultimately, its about lower cost o

ownership, better engineering, and ewer

late nights debugging aky sotware!

WhAT ARe some ofThe LoW hAngIng fRuITA sofTWARe TeAmcAn TAckLe To mAkeTheIR sofTWARe moReoPeRABLe?

The best thing a sotware team can do to

make their sotware more operable is to

write a drat operation manual alongside

eature development. The operation

manual (aka run book) eventually contains

the ull details o how the sotware system

is operated in Production. By writing a

drat operation manual, the sotware

team can demonstrate to the operations

olks that either all the major operability

concerns have been addressed or that

some operability criteria are beyond the

expertise o the sotware team, but at least

there will be no nasty surprises when the

sotware is put into operation. The act

o having to think about things like

backups, time changes, health checks, and

clear-down steps in the context o their

sotware tends to mean that the sotware

team members will implement small but

crucial changes to the sotware to provide

hooks or monitoring, alerting, backups,ailover, etc., which improve the operability

o the sotware.

Beyond ThAT, WhATWouLd RePResenT hIgheRLeveL of oPeRABILITy?

Sotware with a high level o operability

is easy to deploy, test, and interrogate

in the Production environment. Highly

operable sotware provides the operations

team with the right amount o good-

quality inormation about the state o the

service being provided, and will exhibit

predicable and non-catastrophic ailure

modes when under high load or abnormal

conditions. Systems with good sotware

operability also lend themselves to rapid

diagnosis and simple recovery ollowing

a problem, because they have been built

with operational criteria as rst-class

concerns. - How do you make the case or

operability when the main business ocus is

Matthew has been building, deploying and operating commercial sotware systemsor over 13 years. He has engineered sotware systems or organisations in fnance,

insurance, pharmaceuticals, travel and media, as well as or MRI brain scanners and

oil and gas exploration. He looks ater build and deployment at thetrainline.com, the UKsleading rail ticket vendor which operates one o the countrys busiest web inrastructures.

Matthew Skelton

@matthewskelton
http://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/matthewskeltonhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/matthewskelton


9/15

9

usually on eatures? I think one o the most

important changes to make is to stop using

the term non-unctional requirements

or things like perormance and stability

requirements; instead, use the term

operational requirements, or even better,

operational eatures, and include these in

the product backlog alongside end-user

eatures. This gets away rom the articial(and unhelpul) contrast o unctional vs

non-unctional requirements, and helps

to communicate to the business that the

operational aspects o the sotware also

require specic eatures i the business

requirements are going to be met.

A useul approach (discussed at the

excellent DevOpsDays 2013 event in

London) is to make the product owner

responsible not only or eature delivery but

also operational success o the sotware;

ater a ew early morning Priority 1 call-outs

due to the application servers needing a

restart, the product owner will probably

start to realise the importance o operational

eatures! Making any operational problems

more visible is also crucial. I the operations

team needs to restart the app servers

every night, make this visible, and include

the product owner or business sponsor in

the email notications every day. Draw

analogies with systems amiliar to the

product owner: i they had to have their car

xed by a mechanic every two days, theydsoon either buy a new car or pay to have

the aulty part replaced. So, dont hide the

efort which youre expending on keeping

their sotware product running; make sure

they see the cost (and the pain!).

WheRe shouLd WeLook foR fuRTheRInfoRmATIon onoPeRABILITy?

A good starting point to learn more aboutsotware operability is the excellent

book Patterns or Perormance and

Operability by Ford, Gileadi, et al (ISBN

978-1420053340), which explains the

core concepts and works through several

real-world examples. In the 1980s and

90s the US space agency NASA did some

really useul work on operability as part

o the space shuttle programme, and

much o the research is available online;

Richard Crowleys talk on Developing

Operabilityat SuperCon 2012 is also worth

reading and understanding. I recently

began a blog at sotwareoperability.com

which I plan to turn into a book in late

2013 or early 2014 to help sotware teams

get to grips with sotware operability. Its

worth saying that teams with a DevOps

approach will generally produce systemswith better operability than teams

split into the traditional DevOps silos.

Im approaching sotware operability

rom this siloed world o DevOps,

mainly because this is where most

organisations still are today, and in act,

I hope that by gaining a better

understanding o sotware

operability, many engineering

teams will move instinctively

towards a DevOps model.

More ino can be ound atsotwareoperability.com,

@Operabilityand #operabilityon Twitter.
http://rcrowley.org/2012/02/25/superconf.htmlhttp://rcrowley.org/2012/02/25/superconf.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_10/softwareoperability.comhttp://softwareoperability.com/http://softwareoperability.com/https://twitter.com/Operabilityhttps://twitter.com/Operabilityhttps://twitter.com/search%3Fq%3D%2523operabilityhttps://twitter.com/search%3Fq%3D%2523operabilityhttps://twitter.com/Operabilityhttp://softwareoperability.com/http://localhost/var/www/apps/conversion/tmp/scratch_10/softwareoperability.comhttp://rcrowley.org/2012/02/25/superconf.htmlhttp://rcrowley.org/2012/02/25/superconf.html


10/15

1010

D

evOps stems rom the idea that

developers and operations

should work more closely together

communicating, knowledge sharing,

and collaborating to increase the

quality o the systems that we build

and operate.

Though DevOps is an id ea that is i nding

a lot o success and adoption, most o

the enthusiasm and thought leadership

appears to come rom the Operations

side o the ence.

This is o course understandable.

With Operations teams being on the

ront line and talking to end users daily,

they have an obvious motivation not

to upset customers through downtime,

and an obvious personal motivation to

avoid ire-ighting issues in avour o

working on higher value projects.

However, as a developer who has always

worked at this intersection o the two

teams, I eel that developers should also

sit up and give more credence to what is

coming out o the DevOps community.

By opening up communication paths

and adopting Operations-like skills and

mindsets, we can likely all beneit

both as individuals and as teams and in

the quality o sotware that we deliver.

Here are some o the reasons why I

think this is the case:

sotware bug, ahardware outage or a ailed

rollout. They wont care i it was a human

error or some arbitrary combination o

events. All they care about is that they

cant use the system as intended.

This might be a product o the systems on

which Ive worked, but with good unit and

integration testing and good QA testing,

it is possible to catch most sotware bugs

that would impact a large percentage o

the user base. However, where things more

typically go wrong is when the system

comes into contact with the real world.

For instance, we might nd our code

perorms badly under real world load,

that a disk lls up, or that users use the

application in a way we didnt anticipate

as they are prone to do!

A DevOps oriented developer or team

have a much more stringent ocus on

these issues and general site reliability.

Theyll not only test their code; they will

think about ailure scenarios and mitigate

them beore code is even released. Theyll

think careully about detailed testing o

their eatures to minimise the risk o them

impacting the broader production system.

They will plan and stage their upgrades

to de-risk releases, and always have a

rollback strategy. They will talk regularly

with operations to ensure that they are

taking into account their experience

with keeping the site available. In short,

devoPs IncReAses

The focus on PRoducTIon

Though sotware teams might divide

themselves into development, QA and

operations, these can be slightly arbitrary

distinctions. The business who are paying

or all o this only care about the net

output o what those three teams deliver

the value that the nished production

sotware is bringing to the organisation.

Our goal as developers should be to deliver

not just source code but a reliable product,eature or system that is in production and

that people will gain business benet rom.

Though we might be personally motivated

by cutting code, it is all or nothing i our

work never makes it to production, or i

the users o the application have a bad

time once its out.

To my mind, the operational ocus on

production and delivery espoused by

DevOps is a good thing which usually

leads to much more net value or thebusiness. DevOps oriented development

teams have a ocus on value and their

user base, rather than their code base.

devoPs heLPs youImPRove youR sITeReLIABILITy

I your application has downtime,

customers wont care i its due to a

Why Devops matters

(to Developers)Benjamin Wootton is the Principal Consultant atAutumn Devops, a London, UK based

consultancy specializing in DevOps and sotware release automation. He has over 10

years experience working at the intersection o agile Java sotware development and

operations. He is the maintainer o the popularDevOps Friday newsletter.

Benjamin Wootton

@BenjaminWootton
http://benjaminwootton.co.uk/how-to-do-rollback-well/http://www.autumndevops.com/http://localhost/var/www/apps/conversion/tmp/scratch_10/devopsfriday.comhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/BenjaminWoottonhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/BenjaminWoottonhttp://localhost/var/www/apps/conversion/tmp/scratch_10/devopsfriday.comhttp://www.autumndevops.com/http://benjaminwootton.co.uk/how-to-do-rollback-well/


11/15

11

DevOps rightly places site reliability ront

and center, and almost all developers will

benet rom this mindset.

This ocus on site reliability might mean that

sometimes we churn out ewer pure lines o

code in a day, but it means that we move

orward more predictably and reliably

keeping the system stable and available.

devoPs heLPs you BuILdBeTTeR sofTWARe

By being more operationally aware o the

production context that our code lives

within, developers will also design and

build better sotware.

It might be something simple like choosing

to add that additional logging statement

that you know will make troubleshooting

easier later on, or something more

complex such as designing a component

or horizontal scalability or uture growth

scenarios. These kind o operationally

aware decisions can lead to massive

improvements in the net productivity o

the team and the quality o their sotware.

Its only by increasing communication with

operations teams will we developers learn

about these concerns and incorporate

them into our designs and everyday

coding decisions. Simple things such as

joint production incident post-mortems

or the inclusion o operations staf in your

early design process can help you to move

in the right direction. Again, these practises

are core to the DevOps philosophy.

devoPs ImPRoves youRcAReeR PRosPecTs

In addition to being a more well rounded

developer with ocus on production,

Operations skills such as systemadministration, monitoring, scripting,

change management and the broader

knowledge and experience required to

maintain and run complex systems are

genuinely useul or developers to acquire.

In most o the sotware teams and

hiring decisions I have been involved in,

a developer with this prole would have

been more valuable than someone with

superior coding skills but without the

same degree o production awareness.

I believe that as a result o DevOps and

other trends, this will continue, i.e. that the

best developers will increasingly be those

who are the most operationally aware, who

can code but also have the knowledge,

skills and experience to reliably deliver aworking production system over the long

haul. This is particularly true in these tough

and resource constrained economic times.

With ewer people having the luxury or

saying its not my job, the generalist will

get ahead. (I guess or some people, such

as those in startups and small companies,

it was ever thus, with developers pitching

in on operations type stuf such as

deployments and upgrades.)

devoPs heLPsdeveLoPeRs To oWnTheIR PLATfoRms AndInfRAsTRucTuRe

A big element o the DevOps movement

is the idea o inrastructure as code: that

we can dene our inrastructure and

conguration in descriptive les and

metadata, and then be able to test and

repeatably deploy that inrastructure and

our applications on top o it.

This is such a compelling idea with many

benets, and yet developers do not

always embrace and own conguration

management tools as much as our

operations colleagues. By moving towards

inrastructure as code and conguration

management, developers are given the

ability to own and bring under their control

the inrastructure that their code runs on.

People oten say that Apple computers

are so reliable because they own the ull

hardware and sotware stack. Well, withinrastructure as code and repeatable

deploys, developers also get to develop

and deploy and own the whole platorm

on which their sotware is deployed.

It worked on my machine or it worked

in QA should be a thing o the past in a

mature DevOps team making use o tools

such as Vagrant and Puppet, because

the development, test, and production

environments should all be in line, and

all inrastructure changes should also

be versioned and tested alongside the

code assets.

Doing this well removes so many

unknowns and can lead to massive

improvements in eciency and quality o

sotware development.

devoPs heLPs youmAnAge modeRnInfRAsTRucTuRe

DevOps has emerged at a time when

cloud hosting, inrastructure as a service,

and platorm as a service are also reaching

widespread adoption. Cloud and PAAS

make the hosting environment much more

uid. For instance, over time operations

might want to use these platorms to

their ull potential and scale capacity up

or down dynamically. To do that, they will

need to be working with development

much more closely to work out how to

support this in the applications.

Because o this, I would argue that

developers today need to be more aware

o the operational environments in which

their applications will operate.

Increasingly, we will also nd that cloud

inrastructure will be managed throughsotware. For instance, the ability to

provision new boxes via APIs or deploy

applications onto a PAAS. Managing

large scale inrastructure in an automated

ashion likely to start to look more and more

like development work. Development and

operations will increasingly start to look

like one and the same role.

So these are just a ew o the reasons

why I think developers need to look

at DevOps in a lot more detail. Some othis is about a broadening o mindset

rom my job responsibility is to deliver

good code to my job is to deliver and

operate a successul system. Others

are about acquiring the skills that will

actually allow you to do that. With

Operations sta then also actively

moving towards more o a developer

mindset and skillset, DevOps is likely to

continue to grow in importance.
http://vagrantup.com/http://puppetlabs.com/http://puppetlabs.com/http://vagrantup.com/


12/15

12

the state oF the art

monitoring stack

L

ast week I had the pleasure o attending

the rst annual Monitorama conerence.

This was a conerence aimed towards

advancing the state o open source

monitoring and trending sotware.

For many o us starting in this area, our

concept o monitors consists o top,

some apache, mysql and application log

les and perhaps an external ping service

that tells us when our web site is unavailable.

Anything beyond that generally ran into the

commercial product realm. We were scared

of o monitoring by these old monolithicproducts that required huge licensing ees

and armies o proessional services people.

Thankully, times have changed.

And our application ootprint has grown.

No longer are we just deploying web

servers and databases. Our application

stack starts with our automated testing

ramework and runs through continuous

integration and continuous deployment.

Jenkins, Travis, Puppet/Che, etc... theyre

all critical. It also includes our deploymentpartners... that army o SaaS applications we

use to make our lie easier. Any SaaS solution

worth its salt has a status API available or

tracking availability. Our monitoring needs

are now wide and diverse.

My rst exposure to the next generation

o monitoring tools came with the

awesome Etsy post Measure anything,

measure everything.

The concept o Measure Everything wasnt

new to me. Id been working

on StackTach or OpenStack around the

same time and understood the value o

getting a visual representation o the

internals o an application. Even rom my

old management days we used to say you

cant manage what you cant measure.

I lived this with my Google Analytics

experiences rom running various web

sites and my sotware development

management interests were aiming

towards Six Sigma techniques over the

hand-wavey agile methods. Essentially,numbers are good. But this was giving us

a way to apply those same measurement

techniques to running sotware. It was a

lens into the black box. Could the days o

parsing log les be over?

The rst generation o these new

monitoring tools included Zenoss, Nagios,

RRDtool, Cacti, Munin and Gaglia to name

a ew. They were built out o necessity and

oten have some really nasty warts that

people just hate. This latest generation otools have learned rom their mistakes.

The Etsy tool chain started

with statsd with graphite. This introduced

to me the concept o using UDP packets or

instrumenting the running applications...

which was pretty brilliant. For those

unamiliar with statsd and graphite,

heres the ow: your application wants

to measure something, so it sends a UDP

packet to the statsd service. UDP packets

are lossy and unreliable but ast or

large amounts o data. Most large video

networks send via UDP packets.

statsd is a node.js in-memory data

aggregator (it accumulates received data

and every so oten sends it to graphite).

graphite is a django app that archives

received data and gives a unky web

interace or presenting and querying the

data.

There are a number o cool things

happening here:

Adding statsd integration to an

existing application is very easy. No

special libraries needed and sockets

are available in nearly all languages.

Since statsd uses UDP there is

very little risk o the production

application crashing istatsdails. The

packets just get lost.

Since statsd is in-memory, it canprocess a lot o data very quickly.

But rather than take on the task o

archiving and disk access, it simply

orwards the results to something

that can do it better.

graphite has an easy REST interace

which makes it easily accessible

by technical product managers to

create their own dashboards and

status reports.

12

Based in Nova Scotia, Canada,Alex Sandy Walsh is the owner o Dark Secret Sotware.

He has been a senior proessional developer or nearly 20 years and a Pythonista

or 10 years. He is currently a developer on the OpenStack project with Rackspace.

You can learn more about him atsandywalsh.com or ollow@TheSandyWalsh.

Sandy Walsh

@TheSandyWalsh
http://www.monitorama.com/http://linux.die.net/man/1/tophttp://httpd.apache.org/docs/current/logs.htmlhttp://en.wikipedia.org/wiki/Continuous_integrationhttp://en.wikipedia.org/wiki/Continuous_integrationhttp://en.wikipedia.org/wiki/Continuous_deliveryhttp://jenkins-ci.org/http://about.travis-ci.org/docs/user/getting-started/https://puppetlabs.com/http://www.opscode.com/chef/http://en.wikipedia.org/wiki/Software_as_a_servicehttp://etsy.com/http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/http://www.sandywalsh.com/2012/10/debugging-openstack-with-stacktach-and.htmlhttp://www.openstack.org/http://www.google.com/analytics/http://en.wikipedia.org/wiki/Six_Sigmahttp://en.wikipedia.org/wiki/Scrum_(development)https://github.com/etsy/statsd/http://graphite.wikidot.com/http://en.wikipedia.org/wiki/User_Datagram_Protocolhttp://nodejs.org/https://www.djangoproject.com/http://localhost/var/www/apps/conversion/tmp/scratch_10/sandywalsh.comhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/TheSandyWalshhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/TheSandyWalshhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/TheSandyWalshhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/TheSandyWalshhttp://localhost/var/www/apps/conversion/tmp/scratch_10/sandywalsh.comhttps://www.djangoproject.com/http://nodejs.org/http://en.wikipedia.org/wiki/User_Datagram_Protocolhttp://graphite.wikidot.com/https://github.com/etsy/statsd/http://en.wikipedia.org/wiki/Scrum_(development)http://en.wikipedia.org/wiki/Six_Sigmahttp://www.google.com/analytics/http://www.openstack.org/http://www.sandywalsh.com/2012/10/debugging-openstack-with-stacktach-and.htmlhttp://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/http://etsy.com/http://en.wikipedia.org/wiki/Software_as_a_servicehttp://www.opscode.com/chef/https://puppetlabs.com/http://about.travis-ci.org/docs/user/getting-started/http://jenkins-ci.org/http://en.wikipedia.org/wiki/Continuous_deliveryhttp://en.wikipedia.org/wiki/Continuous_integrationhttp://en.wikipedia.org/wiki/Continuous_integrationhttp://httpd.apache.org/docs/current/logs.htmlhttp://linux.die.net/man/1/tophttp://www.monitorama.com/


13/15

13

Side note: i your application is written

in python and you want to experiment

with this stu without touching your

existing code base, have a look at

the Tach application. This monkey-

patches your python application and

sends the output to statsd or graphite

directly. Pretty cool. Although it was

originally written or use with OpenStack,

it can work with anything.

But the real insight here is a set o

atomic, well ocused tools that could

be put together to create a monitoring

stack. The tool chest o the DevOps team

just expanded.

As our experiences with statsd and

graphite had grown within the company

we also saw where the monitoring stack

ailed. A UDP-based approach wont workor billing or auditing. For these scenarios

you need to have a reliable transport

or events. In OpenStack we publish

event notications to AMQP queues

or consumption by various other tools.

These are important events, oten with

large payloads. When the StackTach

application is unavailable these queues

can grow very quickly, and we dont want

to drop events. This is manageable or

something like OpenStack Compute, but

other applications like Storage produce

an incredible amount o data across

a wide range o servers. Using a

notication-based system would

be dicult. Instead we needed to

look at syslog-based archiving and

processing solutions. The new monitoring

stack ofers tools like LogStash and, in the

OpenStack case, Slogging.

Then there is the post-processing.

To add value to the raw events we oten

need to apply other unctions to the data

such as times series averaging. This can be

tricky. We need to wait or all the collected

data to arrive beore we can start the post-

processing. We may need to ensure proper

ordering. Historically this would be done

with cron jobs and batch processing, but

the new monitoring stack includes tools

like Riemann which can do this post-

processing inline.

Day Two was tactical with the tools

and included a hackathon which

let you understand where the real-

world pain lived in each o these

components.

It was small enough so you could

actually talk to people and have

meaningul conversations.

The Day One talks made it clear that

Alert Fatigue (a term borrowed rom

the medical industry) is a big problem.

Too many alerts hitting our inbox.

Some are important, most are noise.

There are people working on it, but its

perhaps the biggest source o angst or

operations currently.

Side story: or the hackathon I started

work on a tool that allowed memberso the company to track external events

that might afect production. Things like

sales events, big holidays, new customer

deployments or internal events such

as new code deployments, hardware

upgrades, etc. The idea was to have these

events show up on the spikes in the

dashboard graphs so we could say That

spike was due to Foo and that ravine

was due to Blah. I made some good

progress or the day and then one o

the other attendees showed me his sideproject Anthricite, which does all this and

more. The author was sitting in the room

next to me. What are the odds?

For a while I was getting disillusioned

with this space because I saw it dominated

with commercial solutions or that the

problem was so big it would be a lietime

o work to build as open source. But

now I see there are viable open source

components and there is enough o the

stack available that we can ocus on some

o the smaller missing pieces. Also, there

is a smart community out there acing the

exact same problems and actively working

on solutions. There is a light at the end

o the tunnel.

I may not attend Monitorama EU,

but denitely the US one next year. But or

now, Ive got some products to learn.

It seems evident that Nagios isnt going

anywhere any time soon, but there are

some other tools ofering alternatives such

as Shinkin and Sensu.

Recently our team has been working

on bringing what weve learned with

StackTach to the OpenStack-blessed

monitoring solution called Ceilometer.

Without standing back and looking at the

larger monitoring community it would have

been very easy to want to recreate an entire

monitoring stack on our own. But now its

clear that we can ocus on the minimal set

o missing unctionality and augment that

rom an already powerul set o tools. This is

a very attractive proposition or one simple

reason: the project has an end in sight.

There are lots o un problems out there

to tackle and knowing you dont have toreinvent the wheel is very compelling.

There is a cost though. The monitoring

stack today consists o a variety o tools

all written in diferent languages and

each with diferent care and and eeding

instructions. One could argue that the

workload on operations will only increase

by mixing and matching. My knee-jerk

reaction is to agree, but I know that the

greater win is to get amiliar with all o

these new tools. In production, thesemonitoring tools need monitoring as

well. So we may have to monitor Java, Ruby,

Python and C# VMs running bytecode

rom a potential variety o languages.

I this all seems too daunting, perhaps the

hosted oferings are a better choice or

you. For nearly every open source ofering

there are hosted oferings. Look at loggly,

papertrail, pagerduty, librato, datadog,

hostedgraphite, boundary, new relic, etc.

This brings me back to Monitorama.

The Monitorama conerence had a

ormat that worked very well or me or

the ollowing reasons:

It was only two days long.

Day One ocused on hearing about

the state o the art rom

industry leaders.
https://github.com/ohthree/tachhttp://en.wikipedia.org/wiki/Monkey_patchhttp://en.wikipedia.org/wiki/Monkey_patchhttp://www.openstack.org/software/openstack-compute/http://www.openstack.org/software/openstack-storage/https://github.com/notmyname/slogginghttp://riemann.io/https://github.com/Dieterbehttps://github.com/Dieterbehttps://github.com/Dieterbe/anthracitehttp://monitorama.eu/http://www.shinken-monitoring.org/http://www.sonian.com/cloud-monitoring-sensu/https://launchpad.net/ceilometerhttp://www.loggly.com/https://papertrailapp.com/http://www.pagerduty.com/http://www.pagerduty.com/https://metrics.librato.com/http://www.datadoghq.com/https://www.hostedgraphite.com/http://boundary.com/http://newrelic.com/http://newrelic.com/http://boundary.com/https://www.hostedgraphite.com/http://www.datadoghq.com/https://metrics.librato.com/http://www.pagerduty.com/https://papertrailapp.com/http://www.loggly.com/https://launchpad.net/ceilometerhttp://www.sonian.com/cloud-monitoring-sensu/http://www.shinken-monitoring.org/http://monitorama.eu/https://github.com/Dieterbe/anthracitehttps://github.com/Dieterbehttps://github.com/Dieterbehttp://riemann.io/https://github.com/notmyname/slogginghttp://www.openstack.org/software/openstack-storage/http://www.openstack.org/software/openstack-compute/http://en.wikipedia.org/wiki/Monkey_patchhttp://en.wikipedia.org/wiki/Monkey_patchhttps://github.com/ohthree/tach


14/15

14

the benchmark youre reaDing

is probably Wrong

M

ikeal Rogers wrote a blog post

on MongoDB perormance and

durability. In one o the sections, he writes

about the request/response model, and

makes the ollowing statement:

MongoDB, by default, doesnt actually

have a response for writes.

In response, one o 10gen employees (the

company behind MongoDB) made the

ollowing comment on Hacker News:

We did this to make MongoDB look goodin stupid benchmarks.

The benchmarkin question shows a single

graph, which demonstrates that MongoDB

is 27 times aster thanCouchDBon inserting

one million rows. At the rst glance, the

benchmark immediately looks silly i youve

ever done serious benchmarking beore.

CouchDB people are smart, inserting such

a small number o elements is a relatively

simple eature, and its almost certain that

either they would have xed somethingthat simple or they had a very good reason

not to (in which case the benchmark is likely

measuring apples and oranges).

Lets do some back o the envelope math.

Roundtrip latency on a commodity network

or a small packet can range rom 0.2ms

to 0.8ms. A single rotational drive can do

15000RPM / 60sec = 250 operations per

second (resulting in close to 5ms latency in

practice), and a single Intel X25-m SSD drive

can do about 7000 write operations per

second (resulting in close to 0.15ms latency).

The benchmark demonstrates that

CouchDB takes an average o 0.5ms

per document to insert one million

documents, while MongoDB does the

same in 0.01ms. Clearly the rotational

drives are too slow to play a part in the

measurement, and the SSD drives are

probably too ast to matter or CouchDB

and too slow to matter or MongoDB.

However, CouchDBappears to be awully

close to commonly encountered network

latencies, while MongoDB inserts each

document 50 times aster than commodity

network latency.

At rst observation, it appears likely that

the CouchDB client library is congured

to wait or the socket to receive a response

rom the database server beore sending

the next insert, while the MongoDB

client is congured to continue sending

insert requests without waiting or a

response. I this is true, the benchmark

compares apples and oranges and tells

you absolutely nothing about which

database engine is actually aster at

inserting elements. It doesnt measure

how ast each engine handles insertion

when the dataset ts into memory, when

the dataset spills onto disk, or when

there are multiple concurrent clients

(which is a whole diferent can o worms).

It doesnt even begin to address the more

subtle issues o whether the potential

bottlenecks or each database might reside

in the virtual memory conguration, or the

le system, or the operating system I/O

scheduler, or some other part o the stack,

because each database uses each one

o these components slightly diferently.

What the benchmark likely measures is

something that is never mentioned the

latency o the network stack or CouchDB,

and something entirely unrelated

orMongoDB.

Unortunately most benchmarks published

online have similar crucial aws in the

methodology, and since many people

make decisions based on this inormation,

sotware vendors are orced to modiy the

deault conguration o their products to

look good on these benchmarks. There is

no easy solution perorming proper

benchmarks is very error-prone, time

consuming work. Its good to be very

skeptical about benchmarks that show a

large perormance diference but dontcareully discuss the methodology and

potential pitalls. As Brad Pitts character

says at the end oInglourious Basterds,

Long story short, we hear a story

too good to be true, it aint.

This blog post originally appeared

on rethinkdb.com/blogin July 2010.

RethinkDB Team

@RethinkDb

The RethinkDB team is working on a scalable, open-source, distributed document

database system that eatures a pleasant query language, parallelized architecture, and

table joins. You can learn more atrethinkdb.com.
http://www.futurealoof.com/posts/mongodb-performance-durability.htmlhttp://www.kchodorow.com/blog/2009/06/29/couchdb-vs-mongodb-benchmark/http://localhost/var/www/apps/conversion/tmp/scratch_10/rethinkdb.com/blog%20http://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/RethinkDbhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/RethinkDbhttp://localhost/var/www/apps/conversion/tmp/scratch_10/rethinkdb.comhttp://localhost/var/www/apps/conversion/tmp/scratch_10/rethinkdb.comhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/RethinkDbhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/RethinkDbhttp://localhost/var/www/apps/conversion/tmp/scratch_10/rethinkdb.com/blog%20http://www.kchodorow.com/blog/2009/06/29/couchdb-vs-mongodb-benchmark/http://www.futurealoof.com/posts/mongodb-performance-durability.html


15/15

Sign up at

DevOpsFriDay.COM

or next weeks issue!
http://devopsfriday.com/http://devopsfriday.com/

devops friday - 12th april 2013

Documents