devops friday - 12th april 2013
TRANSCRIPT
-
7/28/2019 DevOps Friday - 12th April 2013
1/15
Insight from DevOps Thought Leaders
12t APRIL 2013
-
7/28/2019 DevOps Friday - 12th April 2013
2/15
Welcome to Devops FriDay!Each week, we summarise the best content o the week coming out othe DevOps community.
The newsletter is curated by DevOps enthusiast, Benjamin Wootton.
I welcome any eedback or suggestions or content or the newslettervia Twitter @benjaminwootton.
To sign up or uture editions, please visit the DevOps Friday home page.
12th april 2013
DevOps Friday is proudly supported by ServerDensity.
Server Density is a server and website monitoring tool.
Were supporting DevOps Friday in its quest to continually
publish bite size insights and news into the DevOps world.
I you like this, please visit ServerDensity
and the ServerDensity blog.
Curated byBjai Wtt
www.devopsriday.com
https://twitter.com/benjaminwoottonhttp://www.devopsfriday.com/http://www.serverdensity.com/http://blog.serverdensity.com/?kme=Followed+Sponsor+Link&km_email+type=devopsfridayhttp://www.devopsfriday.com/http://www.devopsfriday.com/http://blog.serverdensity.com/?kme=Followed+Sponsor+Link&km_email+type=devopsfridayhttp://www.serverdensity.com/http://www.devopsfriday.com/https://twitter.com/benjaminwootton -
7/28/2019 DevOps Friday - 12th April 2013
3/15
3
5
10
6
12
8
14
groWing an ops teamFrom one FounDerDavid Mytton
mattheW skeltonon soFtWare operabilityMatthew Skelton
the benchmark yourereaDing is probably WrongRethinkDB Team
Why Devops matters(to Developers)Benjamin Wootton
application supportis perFect For DevopsMatt Watson
the state oF the artmonitoring stackSandy Walsh
Unortunately, most benchmarks
published online have crucial aws
in the methodology, and since many
people make decisions based on
this inormation, sotware vendors
are orced to modiy the deault
conguration o their products to
look good on these benchmarks.
Operability is an engineering term
concerning the qualities o a system
which make it work well over its
lietime, and sotware operability
applies these core engineering
principles to sotware systems.
In the early days o 2009, it was
just me running the Server Density
monitoring inrastructure. Over the
last 4 years the service has grown in
terms o team members, data volume,
customers and inrastructure so here
are a ew lessons rom scaling the ops
team and how things are run.
DevOps stems rom the idea that
developers and operations should
work more closely together to
increase the quality o the systems
that we build and operate,
but most o the enthusiasm and
thought leadership appears to
come rom the Operations side
o the ence.
Finally, organizations can embrace
a DevOps approach that improves
application support even i they
dont have a ormal DevOps team.
Stackiy is the only solution that
provides the proper access, tools and
intelligence to improve application
support eciency.
For many o us starting in this area,
our concept o monitors consists
o top, some apache, mysql and
application log les. We were scared
of o monitoring by these old
monolithic products that required
huge licensing ees and armies
o proessional services people.
Thankully, times have changed.
contents
-
7/28/2019 DevOps Friday - 12th April 2013
4/15
4
ReLAunchIngdevoPs fRIdAy
ThIs WeekIn devoPs
Around a year ago, I started a small
newsletter summarising the best DevOps
related links o the week.
Since that time, interest has continued
to grow in DevOps. Fantastic blog posts
and articles are coming out each week,
advancing the state o the art. Conerences
are generating massive amounts o
interesting content and discussion.
Discussion on Twitter and on the podcasts
is entertaining and educational.
Considering this, I want to take DevOps
Friday to the next level, using it as a hubto capture and communicate the best
content o each week. I you like this issue,
please consider sharing with riends and
colleagues and encouraging them to sign
up at the DevOps Friday home page.
Keith and Marios Guide to Fast Websites
MongoDB Large Scale Data Centric Applications
Hiring or the DevOps Toolchain: The Need or Generalists
How Badly Set Goals Create a Tug-o-War in Your DevOps Organization
Treating Servers as Cattle, Not as Pets
Achieving Awesomeness with Opscode Che (Part 2)
Making A Point With SLAs
Amazon Cloud A River Runs Through It
Using Message Queues in Cloud Applications
Are You Unknowingly Replicating Your Failure as a DBA?
this Week in Devops
http://www.devopsfriday.com/http://webuild.envato.com/blog/keith-and-marios-guide-to-fast-websites/http://www.infoq.com/presentations/MongoDB-Designhttps://puppetlabs.com/blog/hiring-for-the-devops-toolchain-the-need-for-generalists/http://architects.dzone.com/articles/how-badly-set-goals-create-tug?mz=38541-devopshttp://http//architects.dzone.com/articles/treating-servers-cattle-not?mz=38541-devopshttp://http//www.opscode.com/blog/2013/03/18/achieving-awesomness-with-opscode-chef-part-2/http://blog.serverdensity.com/making-a-point-with-slas/http://devops.rackspace.com/using-message-queues-in-cloud-applications.htmlhttp://devops.rackspace.com/using-message-queues-in-cloud-applications.htmlhttp://devops.rackspace.com/is-mysql-unknowingly-replicating-your-failure-as-a-dba.htmlhttp://devops.rackspace.com/is-mysql-unknowingly-replicating-your-failure-as-a-dba.htmlhttp://devops.rackspace.com/using-message-queues-in-cloud-applications.htmlhttp://devops.rackspace.com/using-message-queues-in-cloud-applications.htmlhttp://blog.serverdensity.com/making-a-point-with-slas/http://http//www.opscode.com/blog/2013/03/18/achieving-awesomness-with-opscode-chef-part-2/http://http//architects.dzone.com/articles/treating-servers-cattle-not?mz=38541-devopshttp://architects.dzone.com/articles/how-badly-set-goals-create-tug?mz=38541-devopshttps://puppetlabs.com/blog/hiring-for-the-devops-toolchain-the-need-for-generalists/http://www.infoq.com/presentations/MongoDB-Designhttp://webuild.envato.com/blog/keith-and-marios-guide-to-fast-websites/http://www.devopsfriday.com/ -
7/28/2019 DevOps Friday - 12th April 2013
5/15
5
When DevOps emerged in 2009, thegap between development andoperations teams nally started to getthe kind o media and vendor attention itdeserved. DevOps gets developers moreinvolved in IT operations so they canmore rapidly resolve sotware issues thatarise ater deployment. Without accessto production applications and servers,even development managers and systemadmins need help identiying and solvingproblems, which is horribly inecient.
Some o us have been doing DevOpseven beore it had a name. At my lastcompany, the lead developers were heavily
involved in hardware purchases, settinghardware up, deploying code, monitoringsystems and much more. The problemwas that only three o the 40 developershad production access. The chosen three(including me) spent an inordinate amounto time helping others troubleshoot andx application bugs. While I didnt trustthe junior developers with the keys tothe kingdom I nevertheless would havepreerred them to have the ability to xtheir own bugs. Because our applicationsupport processes werent very ecient,I wasted a lot o my own time xing bugsinstead o building new eatures.
Later, I started Stackiy because I believethat more developers should be involvedin production application support. Thatway, a couple o employees like thethree o us at my old job dont become abottleneck. Meanwhile, junior developers,QA and even less technical support peoplecan get server access to view log les andother basic troubleshooting inormation.Sadly in most companies today, the leaddeveloper or system admin ends up
tracking down a log le or nding someminor bug in another developers app
when they should be working on moreimportant projects.
Developers should be more involved in thedesign and support o the inrastructureour applications rely on since we areultimately responsible or the applicationswe create. We should be able to deploy ourapplications, monitor production systems,ensure everything is working properly andbe held responsible when our applicationsail in production.
Finding and xing bugs is oten more
dicult than it sounds, however. Just thinkor a moment. What do your developersneed access to? I your team is anythinglike mine was they need:
A database o application exceptions
Application and server log les
Windows Event Viewer
Application and server cong les
SQL databases to test queries
Scheduled jobs history
Server monitoring tools
Perormance monitoring tools
and the list goes on.
When a developer is trying to x a bug,nothing is more rustrating than lackingthe details necessary to reproduce or xthe problem. Troubleshooting applicationproblems can require access to a lot oinormation which in turn involves a lot oscreens and a lot o logins. Imagine gettingall the inormation you need in a singlescreen and then having the ability to drilldown into it with a couple o mouse clicks.
As nice as it sounds, giving developersaccess to the inormation they need has
been more dicult than it sounds because:
The data resides in many locations
Too many tools exist to accessdiferent types o inormation
It can be dicult or impossible tocontrol access rights and protectsensitive data
Developers should be preventedrom making changes
It is dicult or impossible to auditwhat developers access
To overcome the challenges outlined inthis post, I and my team at Stackiy builta solution that gives developers access toall the inormation they need to provideefective application support. It alsosolves the problems that have preventedsuch inormation sharing in the past.With Stackiy, you can eliminatebottlenecks in development teams andscale application support teams withoutadditional head count.
Finally, organizations can embracea DevOps approach that improvesapplication support even i they donthave a ormal DevOps team. Stackiy isthe only solution that provides the properaccess, tools and intelligence to improveapplication support eciency.
application support
is perFect For DevopsMatt Watson oundedStackiy in 2012 and as CEO provides the vision and leadershipor the direction o the company. Matts goal is to simpliy IT operations via Stackiys
DevOps solution. Prior to ounding Stackiy, he was the ounder and CTO o VinSolutions.
Matt is an entrepreneur at heart and excels and product and sotware development.
Matt Watson
@stackiy
http://www.stackify.com/defining-the-ops-in-devops/http://www.stackify.com/server-monitoring/http://www.stackify.com/server-monitoring/http://www.stackify.com/developers-production-access-applications-support-stackify-2/http://www.stackify.com/access-log-files/http://www.stackify.com/server-monitoring/http://www.stackify.com/application-support/http://www.stackify.com/http://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/Stackifyhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/Stackifyhttp://www.stackify.com/http://www.stackify.com/application-support/http://www.stackify.com/server-monitoring/http://www.stackify.com/access-log-files/http://www.stackify.com/developers-production-access-applications-support-stackify-2/http://www.stackify.com/server-monitoring/http://www.stackify.com/server-monitoring/http://www.stackify.com/defining-the-ops-in-devops/ -
7/28/2019 DevOps Friday - 12th April 2013
6/15
6
I
n the early days o 2009, it was just me
running the Server Density monitoring
inrastructure. The service came out o
beta in the summer and immediately
had a ew paying customers which
helped to und the rental o a couple
o slices rom Slicehost (ancy VPSs).
The volume o traic, simplicity o the
service components and small number
o servers meant that there were ew
problems.
Over the last 4 years the service has
grown in terms o team members, data
volume, customers and inrastructure sohere are a ew lessons rom scaling the
ops team and how things are run.
BooTsTRAPPIngofTen meAns LeAvIngThIngs To LAsT mInuTe
Ideally youll anticipate problems and have
a solution well in advance, but thats not
always possible. The most likely reason in
the early days is cash; or lack o it.
In August o 2009 Id just completed our
migration rom MySQL to MongoDB
and it still had problems with eagerly
eating up disk space. This prompted
setting up a new server with increased
disk space because resizing a Slicehost
instance wouldve meant some hours o
downtime. It went down to the very last
ew bytes o remaining disk space as the
sync completed.
IT ALso meAns TRyIng
To fInd The quIckesT WAyTo do ThIngs
Time is something you dont have much o
and one o the slowest things is transerring
large quantities o data over the internet.
We had an unexpected ailure where we
had to do a ull resync o a MongoDB slave
in a diferent data centre, which wouldve
taken 6 days. Instead, we copied the data
onto a USB disk drive and had UPS ship it to
the other acility. Network transer speeds
worked out at around 5MB/s whereas UPS
delivered at 11MB/s.
LeT oTheR PeoPLe heLP
You really need at least one other person
to be able to take on-call duties when
youre away but i thats not possible or as
a backup, you could make use o services
provided by your hosting company or a
third party.
We quickly moved rom Slicehost to
managed servers at Rackspace and
they were able to do monitoring andrespond to issues like servers down or
services not running. They took special
instructions or dierent scenarios and
you could always phone them and
ask them to perorm certain actions. I
remember several instances where I was
away rom my computer and was able to
phone Rackspace support, asking them
to perorm some basic recovery actions
whilst I got back online.
consIdeR suPPoRT
conTRAcTsIn addition to general sysadmin support
rom your hosting provider, you can buy
commercial support contracts or the
sotware products youre using. This could
be Ubuntu Linux, Nginx or MongoDB.
Depending on the level o support you can
get some pretty involved help when you
need it most.
However, theyre oten very expensive and
unafordable as a startup. Even with the
greater resources we now have, supportcontracts are aimed at enterprises with big
budgets. One way to workaround this is to
be very involved with the projects you use.
I was an early adopter o MongoDB and
have a close relationship with 10gen, the
company behind it, so am able to get good
deals on support.
Also consider what support you really
need. Our support contract with MongoDB
was well used in the early days because it
was a new technology. Its signicantly
more stable nowadays and other products,like Apache or example, weve never had
an issue with.
fIguRe ouT WhAT youhAve To do And WhATcAn Be ouTsouRced
I consider keeping core engineering in-
house very important or technology/
sotware companies but there are lots o
David Mytton
@serverdensity
David Mytton is the ounder oServer Density. He has been programming in PHP and
Python or over 10 years, regularly speaks about MongoDB (including running the
London MongoDB User Group), co-ounded the Open Rights Group and can oten be
ound cycling in London or drinking tea in Japan.
groWing an ops team
From one FounDer
http://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/serverdensityhttp://www.serverdensity.com/http://www.serverdensity.com/http://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/serverdensity -
7/28/2019 DevOps Friday - 12th April 2013
7/15
7
things that need doing to run operations
that could be outsourced to (trusted)
individuals on an ad-hoc basis. Engineers
are terrible at valuing their own time and
oten use the argument: why pay or
something I could build/install/congure
mysel?. Candidates or this are things
like running through PCI compliance
checklists, setting up centralised logging,reorganising servers (e.g upgrading
base OSs), researching CDN providers,
integrating CI tools, etc. You always want
someone technical managing the project
to keep things on track and validate the
end results, but these are things you dont
need to do yoursel.
hAck TRAveLIng
As part o the ounding team and even
as an engineer youre likely to have to
travel at some point to conerences,meeting customers, pitching vendors
or maybe on holiday! Its relaxing to
be uncontactable on the plane but its
also scary because you have no idea i
everything is still running. On one o my
trips to Japan, as soon as I stepped of the
12 hour ight to Tokyo Narita, I had a ood
o SMS alerts as one o our MongoDB
servers had encountered a problem 4
hours previously. One o our engineers
had been assigned on call or my ight
and had already worked with the guys at
10gen and resolved the problem.
Youll realise you become a slave to
connectivity so trips to Japan are ne, but
Tajikistan isnt really an option. So you
need to be able to get internet access
and power anywhere you are tricks such
as visiting Starbucks, carrying external
hotspots and not running things like
updates when youre away!
donT foRgeT
The humAn AsPecT
There are a lot o cool tools which
help to automate processes, and these
should be used as much as possible.
However, its still real people running things
in the end. This is the really dicult bit o
having a small team because everyone has
to pitch in and it can be dicult to share
the workload when just a ew people know
how things work.
You have to consider who will take the
call when things break:
How quickly can they get to a
computer they can use to x things?
Are you out drinking on a Friday?
What happens i someone alls
ill? This could be a minor cold ormajor emergency. This could be the
individual engineer or their amily
members. Does the on-call have
enough phone battery? Can they hear
their ringtone?
Who is backup i the primary doesnt
pick up? This is especially the case
with outages. They oten happen at
inconvenient times and big incidents
might require you to work or
signicant periods o time.
Dealing with communicating with
customers, xing problems and recovering
data can be exhausting especially when
theres nobody else to help. The ultimate
goal is to build your team so that shit
based on-call cover can be provided but
its dicult in the beginning with limited
resources (both or people and multi-
geographic redundancy). Nobody is an
invested in your service as you and your
team Although services like Rackspaces
support are helpul in certain situations,theyre never able to know the ull story
behind your service and how to deal
with complex components. For example,
MongoDB was a completely new database
and didnt have single server durability
or some time a bad shutdown could
require a lengthy database repair, which
was important to take steps to avoid such
as by properly shutting it down beore
powering of the server.
Knowing about the weaknesses and
how to deal with them is somethingthat requires greater knowledge o your
setup that basic vendor support isnt
going to provide. These things should be
a stopgap or supplement the end goal
o growing your own team. The whole
point o devops is that its a mixture o
engineering and operations so you dont
need to hire dedicated sysadmins. This
works well or small startup teams but
you will eventually want someone (or
multiple people) who are responsible
or the day to day operations. Engineers
still engage with the team, can deploy,
work on testing and debug problems but
things like dealing with a ailed disk drive
or implementing backups is really outside
the remit o devops in a large team.
You know youre there when you
can start hiring site reliability
engineers!
-
7/28/2019 DevOps Friday - 12th April 2013
8/15
8
mattheW skelton
on soFtWare operability
WhAT Is sofTWARe
oPeRABILITy And WhyIs IT ImPoRTAnT?
Operability is an engineering term
concerning the qualities o a system
which make it work well over its lietime,
and sotware operability applies these
core engineering principles to sotware
systems. An operable sotware system is
one which delivers not only reliable end-
user unctionality, but also works well
rom the perspective o the operations
team. Such sotware has been built to
operate successully without needing
application restarts, server reboots, load-
balancer hacks, or any o the countless
other xes and work-arounds which
operations teams have to use in order to
make many business sotware systems
work in practice on a daily basis. Sotware
systems which ollow sotware operability
good practice will tend to be simpler to
operate and maintain, with a reduced cost
o ownership, and almost certainly ewer
operational problems.
WheRe dId youRInTeResT In oPeRABILITycome fRom?
Early in my career I built sotware systems
or MRI (brain) scanners and oil & gas
exploration. Operability or such systems
is essential; its no use building an MRI
scanner which can produce 3D brain
images i it needs rebooting ater taking
every second image. Likewise, it was
cheaper to drill a new oil well than to
extract a aulty down-hole pressure gauge;
these systems had to operate reliably with
minimal human intervention. Since then I
have too oten seen the negative efects
o operational eatures being dropped
beore go-live, which usually results in
signicant operational costs and more
incidents in Production. There is no good
reason in 2013 why businesses should
put up with (and pay or) second-rate
sotware which needs arduous human
attention every ew hours or days just in
order to maintain normal operation. In
my experience, most modern businesssotware is simple enough (at a systems
level at least) that we can signicantly
reduce operational cost and downtime
by introducing sotware operability as a
key concern or sotware product delivery
teams. Ultimately, its about lower cost o
ownership, better engineering, and ewer
late nights debugging aky sotware!
WhAT ARe some ofThe LoW hAngIng fRuITA sofTWARe TeAmcAn TAckLe To mAkeTheIR sofTWARe moReoPeRABLe?
The best thing a sotware team can do to
make their sotware more operable is to
write a drat operation manual alongside
eature development. The operation
manual (aka run book) eventually contains
the ull details o how the sotware system
is operated in Production. By writing a
drat operation manual, the sotware
team can demonstrate to the operations
olks that either all the major operability
concerns have been addressed or that
some operability criteria are beyond the
expertise o the sotware team, but at least
there will be no nasty surprises when the
sotware is put into operation. The act
o having to think about things like
backups, time changes, health checks, and
clear-down steps in the context o their
sotware tends to mean that the sotware
team members will implement small but
crucial changes to the sotware to provide
hooks or monitoring, alerting, backups,ailover, etc., which improve the operability
o the sotware.
Beyond ThAT, WhATWouLd RePResenT hIgheRLeveL of oPeRABILITy?
Sotware with a high level o operability
is easy to deploy, test, and interrogate
in the Production environment. Highly
operable sotware provides the operations
team with the right amount o good-
quality inormation about the state o the
service being provided, and will exhibit
predicable and non-catastrophic ailure
modes when under high load or abnormal
conditions. Systems with good sotware
operability also lend themselves to rapid
diagnosis and simple recovery ollowing
a problem, because they have been built
with operational criteria as rst-class
concerns. - How do you make the case or
operability when the main business ocus is
Matthew has been building, deploying and operating commercial sotware systemsor over 13 years. He has engineered sotware systems or organisations in fnance,
insurance, pharmaceuticals, travel and media, as well as or MRI brain scanners and
oil and gas exploration. He looks ater build and deployment at thetrainline.com, the UKsleading rail ticket vendor which operates one o the countrys busiest web inrastructures.
Matthew Skelton
@matthewskelton
http://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/matthewskeltonhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/matthewskelton -
7/28/2019 DevOps Friday - 12th April 2013
9/15
9
usually on eatures? I think one o the most
important changes to make is to stop using
the term non-unctional requirements
or things like perormance and stability
requirements; instead, use the term
operational requirements, or even better,
operational eatures, and include these in
the product backlog alongside end-user
eatures. This gets away rom the articial(and unhelpul) contrast o unctional vs
non-unctional requirements, and helps
to communicate to the business that the
operational aspects o the sotware also
require specic eatures i the business
requirements are going to be met.
A useul approach (discussed at the
excellent DevOpsDays 2013 event in
London) is to make the product owner
responsible not only or eature delivery but
also operational success o the sotware;
ater a ew early morning Priority 1 call-outs
due to the application servers needing a
restart, the product owner will probably
start to realise the importance o operational
eatures! Making any operational problems
more visible is also crucial. I the operations
team needs to restart the app servers
every night, make this visible, and include
the product owner or business sponsor in
the email notications every day. Draw
analogies with systems amiliar to the
product owner: i they had to have their car
xed by a mechanic every two days, theydsoon either buy a new car or pay to have
the aulty part replaced. So, dont hide the
efort which youre expending on keeping
their sotware product running; make sure
they see the cost (and the pain!).
WheRe shouLd WeLook foR fuRTheRInfoRmATIon onoPeRABILITy?
A good starting point to learn more aboutsotware operability is the excellent
book Patterns or Perormance and
Operability by Ford, Gileadi, et al (ISBN
978-1420053340), which explains the
core concepts and works through several
real-world examples. In the 1980s and
90s the US space agency NASA did some
really useul work on operability as part
o the space shuttle programme, and
much o the research is available online;
Richard Crowleys talk on Developing
Operabilityat SuperCon 2012 is also worth
reading and understanding. I recently
began a blog at sotwareoperability.com
which I plan to turn into a book in late
2013 or early 2014 to help sotware teams
get to grips with sotware operability. Its
worth saying that teams with a DevOps
approach will generally produce systemswith better operability than teams
split into the traditional DevOps silos.
Im approaching sotware operability
rom this siloed world o DevOps,
mainly because this is where most
organisations still are today, and in act,
I hope that by gaining a better
understanding o sotware
operability, many engineering
teams will move instinctively
towards a DevOps model.
More ino can be ound atsotwareoperability.com,
@Operabilityand #operabilityon Twitter.
http://rcrowley.org/2012/02/25/superconf.htmlhttp://rcrowley.org/2012/02/25/superconf.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_10/softwareoperability.comhttp://softwareoperability.com/http://softwareoperability.com/https://twitter.com/Operabilityhttps://twitter.com/Operabilityhttps://twitter.com/search%3Fq%3D%2523operabilityhttps://twitter.com/search%3Fq%3D%2523operabilityhttps://twitter.com/Operabilityhttp://softwareoperability.com/http://localhost/var/www/apps/conversion/tmp/scratch_10/softwareoperability.comhttp://rcrowley.org/2012/02/25/superconf.htmlhttp://rcrowley.org/2012/02/25/superconf.html -
7/28/2019 DevOps Friday - 12th April 2013
10/15
1010
D
evOps stems rom the idea that
developers and operations
should work more closely together
communicating, knowledge sharing,
and collaborating to increase the
quality o the systems that we build
and operate.
Though DevOps is an id ea that is i nding
a lot o success and adoption, most o
the enthusiasm and thought leadership
appears to come rom the Operations
side o the ence.
This is o course understandable.
With Operations teams being on the
ront line and talking to end users daily,
they have an obvious motivation not
to upset customers through downtime,
and an obvious personal motivation to
avoid ire-ighting issues in avour o
working on higher value projects.
However, as a developer who has always
worked at this intersection o the two
teams, I eel that developers should also
sit up and give more credence to what is
coming out o the DevOps community.
By opening up communication paths
and adopting Operations-like skills and
mindsets, we can likely all beneit
both as individuals and as teams and in
the quality o sotware that we deliver.
Here are some o the reasons why I
think this is the case:
sotware bug, ahardware outage or a ailed
rollout. They wont care i it was a human
error or some arbitrary combination o
events. All they care about is that they
cant use the system as intended.
This might be a product o the systems on
which Ive worked, but with good unit and
integration testing and good QA testing,
it is possible to catch most sotware bugs
that would impact a large percentage o
the user base. However, where things more
typically go wrong is when the system
comes into contact with the real world.
For instance, we might nd our code
perorms badly under real world load,
that a disk lls up, or that users use the
application in a way we didnt anticipate
as they are prone to do!
A DevOps oriented developer or team
have a much more stringent ocus on
these issues and general site reliability.
Theyll not only test their code; they will
think about ailure scenarios and mitigate
them beore code is even released. Theyll
think careully about detailed testing o
their eatures to minimise the risk o them
impacting the broader production system.
They will plan and stage their upgrades
to de-risk releases, and always have a
rollback strategy. They will talk regularly
with operations to ensure that they are
taking into account their experience
with keeping the site available. In short,
devoPs IncReAses
The focus on PRoducTIon
Though sotware teams might divide
themselves into development, QA and
operations, these can be slightly arbitrary
distinctions. The business who are paying
or all o this only care about the net
output o what those three teams deliver
the value that the nished production
sotware is bringing to the organisation.
Our goal as developers should be to deliver
not just source code but a reliable product,eature or system that is in production and
that people will gain business benet rom.
Though we might be personally motivated
by cutting code, it is all or nothing i our
work never makes it to production, or i
the users o the application have a bad
time once its out.
To my mind, the operational ocus on
production and delivery espoused by
DevOps is a good thing which usually
leads to much more net value or thebusiness. DevOps oriented development
teams have a ocus on value and their
user base, rather than their code base.
devoPs heLPs youImPRove youR sITeReLIABILITy
I your application has downtime,
customers wont care i its due to a
Why Devops matters
(to Developers)Benjamin Wootton is the Principal Consultant atAutumn Devops, a London, UK based
consultancy specializing in DevOps and sotware release automation. He has over 10
years experience working at the intersection o agile Java sotware development and
operations. He is the maintainer o the popularDevOps Friday newsletter.
Benjamin Wootton
@BenjaminWootton
http://benjaminwootton.co.uk/how-to-do-rollback-well/http://www.autumndevops.com/http://localhost/var/www/apps/conversion/tmp/scratch_10/devopsfriday.comhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/BenjaminWoottonhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/BenjaminWoottonhttp://localhost/var/www/apps/conversion/tmp/scratch_10/devopsfriday.comhttp://www.autumndevops.com/http://benjaminwootton.co.uk/how-to-do-rollback-well/ -
7/28/2019 DevOps Friday - 12th April 2013
11/15
11
DevOps rightly places site reliability ront
and center, and almost all developers will
benet rom this mindset.
This ocus on site reliability might mean that
sometimes we churn out ewer pure lines o
code in a day, but it means that we move
orward more predictably and reliably
keeping the system stable and available.
devoPs heLPs you BuILdBeTTeR sofTWARe
By being more operationally aware o the
production context that our code lives
within, developers will also design and
build better sotware.
It might be something simple like choosing
to add that additional logging statement
that you know will make troubleshooting
easier later on, or something more
complex such as designing a component
or horizontal scalability or uture growth
scenarios. These kind o operationally
aware decisions can lead to massive
improvements in the net productivity o
the team and the quality o their sotware.
Its only by increasing communication with
operations teams will we developers learn
about these concerns and incorporate
them into our designs and everyday
coding decisions. Simple things such as
joint production incident post-mortems
or the inclusion o operations staf in your
early design process can help you to move
in the right direction. Again, these practises
are core to the DevOps philosophy.
devoPs ImPRoves youRcAReeR PRosPecTs
In addition to being a more well rounded
developer with ocus on production,
Operations skills such as systemadministration, monitoring, scripting,
change management and the broader
knowledge and experience required to
maintain and run complex systems are
genuinely useul or developers to acquire.
In most o the sotware teams and
hiring decisions I have been involved in,
a developer with this prole would have
been more valuable than someone with
superior coding skills but without the
same degree o production awareness.
I believe that as a result o DevOps and
other trends, this will continue, i.e. that the
best developers will increasingly be those
who are the most operationally aware, who
can code but also have the knowledge,
skills and experience to reliably deliver aworking production system over the long
haul. This is particularly true in these tough
and resource constrained economic times.
With ewer people having the luxury or
saying its not my job, the generalist will
get ahead. (I guess or some people, such
as those in startups and small companies,
it was ever thus, with developers pitching
in on operations type stuf such as
deployments and upgrades.)
devoPs heLPsdeveLoPeRs To oWnTheIR PLATfoRms AndInfRAsTRucTuRe
A big element o the DevOps movement
is the idea o inrastructure as code: that
we can dene our inrastructure and
conguration in descriptive les and
metadata, and then be able to test and
repeatably deploy that inrastructure and
our applications on top o it.
This is such a compelling idea with many
benets, and yet developers do not
always embrace and own conguration
management tools as much as our
operations colleagues. By moving towards
inrastructure as code and conguration
management, developers are given the
ability to own and bring under their control
the inrastructure that their code runs on.
People oten say that Apple computers
are so reliable because they own the ull
hardware and sotware stack. Well, withinrastructure as code and repeatable
deploys, developers also get to develop
and deploy and own the whole platorm
on which their sotware is deployed.
It worked on my machine or it worked
in QA should be a thing o the past in a
mature DevOps team making use o tools
such as Vagrant and Puppet, because
the development, test, and production
environments should all be in line, and
all inrastructure changes should also
be versioned and tested alongside the
code assets.
Doing this well removes so many
unknowns and can lead to massive
improvements in eciency and quality o
sotware development.
devoPs heLPs youmAnAge modeRnInfRAsTRucTuRe
DevOps has emerged at a time when
cloud hosting, inrastructure as a service,
and platorm as a service are also reaching
widespread adoption. Cloud and PAAS
make the hosting environment much more
uid. For instance, over time operations
might want to use these platorms to
their ull potential and scale capacity up
or down dynamically. To do that, they will
need to be working with development
much more closely to work out how to
support this in the applications.
Because o this, I would argue that
developers today need to be more aware
o the operational environments in which
their applications will operate.
Increasingly, we will also nd that cloud
inrastructure will be managed throughsotware. For instance, the ability to
provision new boxes via APIs or deploy
applications onto a PAAS. Managing
large scale inrastructure in an automated
ashion likely to start to look more and more
like development work. Development and
operations will increasingly start to look
like one and the same role.
So these are just a ew o the reasons
why I think developers need to look
at DevOps in a lot more detail. Some othis is about a broadening o mindset
rom my job responsibility is to deliver
good code to my job is to deliver and
operate a successul system. Others
are about acquiring the skills that will
actually allow you to do that. With
Operations sta then also actively
moving towards more o a developer
mindset and skillset, DevOps is likely to
continue to grow in importance.
http://vagrantup.com/http://puppetlabs.com/http://puppetlabs.com/http://vagrantup.com/ -
7/28/2019 DevOps Friday - 12th April 2013
12/15
12
the state oF the art
monitoring stack
L
ast week I had the pleasure o attending
the rst annual Monitorama conerence.
This was a conerence aimed towards
advancing the state o open source
monitoring and trending sotware.
For many o us starting in this area, our
concept o monitors consists o top,
some apache, mysql and application log
les and perhaps an external ping service
that tells us when our web site is unavailable.
Anything beyond that generally ran into the
commercial product realm. We were scared
of o monitoring by these old monolithicproducts that required huge licensing ees
and armies o proessional services people.
Thankully, times have changed.
And our application ootprint has grown.
No longer are we just deploying web
servers and databases. Our application
stack starts with our automated testing
ramework and runs through continuous
integration and continuous deployment.
Jenkins, Travis, Puppet/Che, etc... theyre
all critical. It also includes our deploymentpartners... that army o SaaS applications we
use to make our lie easier. Any SaaS solution
worth its salt has a status API available or
tracking availability. Our monitoring needs
are now wide and diverse.
My rst exposure to the next generation
o monitoring tools came with the
awesome Etsy post Measure anything,
measure everything.
The concept o Measure Everything wasnt
new to me. Id been working
on StackTach or OpenStack around the
same time and understood the value o
getting a visual representation o the
internals o an application. Even rom my
old management days we used to say you
cant manage what you cant measure.
I lived this with my Google Analytics
experiences rom running various web
sites and my sotware development
management interests were aiming
towards Six Sigma techniques over the
hand-wavey agile methods. Essentially,numbers are good. But this was giving us
a way to apply those same measurement
techniques to running sotware. It was a
lens into the black box. Could the days o
parsing log les be over?
The rst generation o these new
monitoring tools included Zenoss, Nagios,
RRDtool, Cacti, Munin and Gaglia to name
a ew. They were built out o necessity and
oten have some really nasty warts that
people just hate. This latest generation otools have learned rom their mistakes.
The Etsy tool chain started
with statsd with graphite. This introduced
to me the concept o using UDP packets or
instrumenting the running applications...
which was pretty brilliant. For those
unamiliar with statsd and graphite,
heres the ow: your application wants
to measure something, so it sends a UDP
packet to the statsd service. UDP packets
are lossy and unreliable but ast or
large amounts o data. Most large video
networks send via UDP packets.
statsd is a node.js in-memory data
aggregator (it accumulates received data
and every so oten sends it to graphite).
graphite is a django app that archives
received data and gives a unky web
interace or presenting and querying the
data.
There are a number o cool things
happening here:
Adding statsd integration to an
existing application is very easy. No
special libraries needed and sockets
are available in nearly all languages.
Since statsd uses UDP there is
very little risk o the production
application crashing istatsdails. The
packets just get lost.
Since statsd is in-memory, it canprocess a lot o data very quickly.
But rather than take on the task o
archiving and disk access, it simply
orwards the results to something
that can do it better.
graphite has an easy REST interace
which makes it easily accessible
by technical product managers to
create their own dashboards and
status reports.
12
Based in Nova Scotia, Canada,Alex Sandy Walsh is the owner o Dark Secret Sotware.
He has been a senior proessional developer or nearly 20 years and a Pythonista
or 10 years. He is currently a developer on the OpenStack project with Rackspace.
You can learn more about him atsandywalsh.com or ollow@TheSandyWalsh.
Sandy Walsh
@TheSandyWalsh
http://www.monitorama.com/http://linux.die.net/man/1/tophttp://httpd.apache.org/docs/current/logs.htmlhttp://en.wikipedia.org/wiki/Continuous_integrationhttp://en.wikipedia.org/wiki/Continuous_integrationhttp://en.wikipedia.org/wiki/Continuous_deliveryhttp://jenkins-ci.org/http://about.travis-ci.org/docs/user/getting-started/https://puppetlabs.com/http://www.opscode.com/chef/http://en.wikipedia.org/wiki/Software_as_a_servicehttp://etsy.com/http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/http://www.sandywalsh.com/2012/10/debugging-openstack-with-stacktach-and.htmlhttp://www.openstack.org/http://www.google.com/analytics/http://en.wikipedia.org/wiki/Six_Sigmahttp://en.wikipedia.org/wiki/Scrum_(development)https://github.com/etsy/statsd/http://graphite.wikidot.com/http://en.wikipedia.org/wiki/User_Datagram_Protocolhttp://nodejs.org/https://www.djangoproject.com/http://localhost/var/www/apps/conversion/tmp/scratch_10/sandywalsh.comhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/TheSandyWalshhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/TheSandyWalshhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/TheSandyWalshhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/TheSandyWalshhttp://localhost/var/www/apps/conversion/tmp/scratch_10/sandywalsh.comhttps://www.djangoproject.com/http://nodejs.org/http://en.wikipedia.org/wiki/User_Datagram_Protocolhttp://graphite.wikidot.com/https://github.com/etsy/statsd/http://en.wikipedia.org/wiki/Scrum_(development)http://en.wikipedia.org/wiki/Six_Sigmahttp://www.google.com/analytics/http://www.openstack.org/http://www.sandywalsh.com/2012/10/debugging-openstack-with-stacktach-and.htmlhttp://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/http://etsy.com/http://en.wikipedia.org/wiki/Software_as_a_servicehttp://www.opscode.com/chef/https://puppetlabs.com/http://about.travis-ci.org/docs/user/getting-started/http://jenkins-ci.org/http://en.wikipedia.org/wiki/Continuous_deliveryhttp://en.wikipedia.org/wiki/Continuous_integrationhttp://en.wikipedia.org/wiki/Continuous_integrationhttp://httpd.apache.org/docs/current/logs.htmlhttp://linux.die.net/man/1/tophttp://www.monitorama.com/ -
7/28/2019 DevOps Friday - 12th April 2013
13/15
13
Side note: i your application is written
in python and you want to experiment
with this stu without touching your
existing code base, have a look at
the Tach application. This monkey-
patches your python application and
sends the output to statsd or graphite
directly. Pretty cool. Although it was
originally written or use with OpenStack,
it can work with anything.
But the real insight here is a set o
atomic, well ocused tools that could
be put together to create a monitoring
stack. The tool chest o the DevOps team
just expanded.
As our experiences with statsd and
graphite had grown within the company
we also saw where the monitoring stack
ailed. A UDP-based approach wont workor billing or auditing. For these scenarios
you need to have a reliable transport
or events. In OpenStack we publish
event notications to AMQP queues
or consumption by various other tools.
These are important events, oten with
large payloads. When the StackTach
application is unavailable these queues
can grow very quickly, and we dont want
to drop events. This is manageable or
something like OpenStack Compute, but
other applications like Storage produce
an incredible amount o data across
a wide range o servers. Using a
notication-based system would
be dicult. Instead we needed to
look at syslog-based archiving and
processing solutions. The new monitoring
stack ofers tools like LogStash and, in the
OpenStack case, Slogging.
Then there is the post-processing.
To add value to the raw events we oten
need to apply other unctions to the data
such as times series averaging. This can be
tricky. We need to wait or all the collected
data to arrive beore we can start the post-
processing. We may need to ensure proper
ordering. Historically this would be done
with cron jobs and batch processing, but
the new monitoring stack includes tools
like Riemann which can do this post-
processing inline.
Day Two was tactical with the tools
and included a hackathon which
let you understand where the real-
world pain lived in each o these
components.
It was small enough so you could
actually talk to people and have
meaningul conversations.
The Day One talks made it clear that
Alert Fatigue (a term borrowed rom
the medical industry) is a big problem.
Too many alerts hitting our inbox.
Some are important, most are noise.
There are people working on it, but its
perhaps the biggest source o angst or
operations currently.
Side story: or the hackathon I started
work on a tool that allowed memberso the company to track external events
that might afect production. Things like
sales events, big holidays, new customer
deployments or internal events such
as new code deployments, hardware
upgrades, etc. The idea was to have these
events show up on the spikes in the
dashboard graphs so we could say That
spike was due to Foo and that ravine
was due to Blah. I made some good
progress or the day and then one o
the other attendees showed me his sideproject Anthricite, which does all this and
more. The author was sitting in the room
next to me. What are the odds?
For a while I was getting disillusioned
with this space because I saw it dominated
with commercial solutions or that the
problem was so big it would be a lietime
o work to build as open source. But
now I see there are viable open source
components and there is enough o the
stack available that we can ocus on some
o the smaller missing pieces. Also, there
is a smart community out there acing the
exact same problems and actively working
on solutions. There is a light at the end
o the tunnel.
I may not attend Monitorama EU,
but denitely the US one next year. But or
now, Ive got some products to learn.
It seems evident that Nagios isnt going
anywhere any time soon, but there are
some other tools ofering alternatives such
as Shinkin and Sensu.
Recently our team has been working
on bringing what weve learned with
StackTach to the OpenStack-blessed
monitoring solution called Ceilometer.
Without standing back and looking at the
larger monitoring community it would have
been very easy to want to recreate an entire
monitoring stack on our own. But now its
clear that we can ocus on the minimal set
o missing unctionality and augment that
rom an already powerul set o tools. This is
a very attractive proposition or one simple
reason: the project has an end in sight.
There are lots o un problems out there
to tackle and knowing you dont have toreinvent the wheel is very compelling.
There is a cost though. The monitoring
stack today consists o a variety o tools
all written in diferent languages and
each with diferent care and and eeding
instructions. One could argue that the
workload on operations will only increase
by mixing and matching. My knee-jerk
reaction is to agree, but I know that the
greater win is to get amiliar with all o
these new tools. In production, thesemonitoring tools need monitoring as
well. So we may have to monitor Java, Ruby,
Python and C# VMs running bytecode
rom a potential variety o languages.
I this all seems too daunting, perhaps the
hosted oferings are a better choice or
you. For nearly every open source ofering
there are hosted oferings. Look at loggly,
papertrail, pagerduty, librato, datadog,
hostedgraphite, boundary, new relic, etc.
This brings me back to Monitorama.
The Monitorama conerence had a
ormat that worked very well or me or
the ollowing reasons:
It was only two days long.
Day One ocused on hearing about
the state o the art rom
industry leaders.
https://github.com/ohthree/tachhttp://en.wikipedia.org/wiki/Monkey_patchhttp://en.wikipedia.org/wiki/Monkey_patchhttp://www.openstack.org/software/openstack-compute/http://www.openstack.org/software/openstack-storage/https://github.com/notmyname/slogginghttp://riemann.io/https://github.com/Dieterbehttps://github.com/Dieterbehttps://github.com/Dieterbe/anthracitehttp://monitorama.eu/http://www.shinken-monitoring.org/http://www.sonian.com/cloud-monitoring-sensu/https://launchpad.net/ceilometerhttp://www.loggly.com/https://papertrailapp.com/http://www.pagerduty.com/http://www.pagerduty.com/https://metrics.librato.com/http://www.datadoghq.com/https://www.hostedgraphite.com/http://boundary.com/http://newrelic.com/http://newrelic.com/http://boundary.com/https://www.hostedgraphite.com/http://www.datadoghq.com/https://metrics.librato.com/http://www.pagerduty.com/https://papertrailapp.com/http://www.loggly.com/https://launchpad.net/ceilometerhttp://www.sonian.com/cloud-monitoring-sensu/http://www.shinken-monitoring.org/http://monitorama.eu/https://github.com/Dieterbe/anthracitehttps://github.com/Dieterbehttps://github.com/Dieterbehttp://riemann.io/https://github.com/notmyname/slogginghttp://www.openstack.org/software/openstack-storage/http://www.openstack.org/software/openstack-compute/http://en.wikipedia.org/wiki/Monkey_patchhttp://en.wikipedia.org/wiki/Monkey_patchhttps://github.com/ohthree/tach -
7/28/2019 DevOps Friday - 12th April 2013
14/15
14
the benchmark youre reaDing
is probably Wrong
M
ikeal Rogers wrote a blog post
on MongoDB perormance and
durability. In one o the sections, he writes
about the request/response model, and
makes the ollowing statement:
MongoDB, by default, doesnt actually
have a response for writes.
In response, one o 10gen employees (the
company behind MongoDB) made the
ollowing comment on Hacker News:
We did this to make MongoDB look goodin stupid benchmarks.
The benchmarkin question shows a single
graph, which demonstrates that MongoDB
is 27 times aster thanCouchDBon inserting
one million rows. At the rst glance, the
benchmark immediately looks silly i youve
ever done serious benchmarking beore.
CouchDB people are smart, inserting such
a small number o elements is a relatively
simple eature, and its almost certain that
either they would have xed somethingthat simple or they had a very good reason
not to (in which case the benchmark is likely
measuring apples and oranges).
Lets do some back o the envelope math.
Roundtrip latency on a commodity network
or a small packet can range rom 0.2ms
to 0.8ms. A single rotational drive can do
15000RPM / 60sec = 250 operations per
second (resulting in close to 5ms latency in
practice), and a single Intel X25-m SSD drive
can do about 7000 write operations per
second (resulting in close to 0.15ms latency).
The benchmark demonstrates that
CouchDB takes an average o 0.5ms
per document to insert one million
documents, while MongoDB does the
same in 0.01ms. Clearly the rotational
drives are too slow to play a part in the
measurement, and the SSD drives are
probably too ast to matter or CouchDB
and too slow to matter or MongoDB.
However, CouchDBappears to be awully
close to commonly encountered network
latencies, while MongoDB inserts each
document 50 times aster than commodity
network latency.
At rst observation, it appears likely that
the CouchDB client library is congured
to wait or the socket to receive a response
rom the database server beore sending
the next insert, while the MongoDB
client is congured to continue sending
insert requests without waiting or a
response. I this is true, the benchmark
compares apples and oranges and tells
you absolutely nothing about which
database engine is actually aster at
inserting elements. It doesnt measure
how ast each engine handles insertion
when the dataset ts into memory, when
the dataset spills onto disk, or when
there are multiple concurrent clients
(which is a whole diferent can o worms).
It doesnt even begin to address the more
subtle issues o whether the potential
bottlenecks or each database might reside
in the virtual memory conguration, or the
le system, or the operating system I/O
scheduler, or some other part o the stack,
because each database uses each one
o these components slightly diferently.
What the benchmark likely measures is
something that is never mentioned the
latency o the network stack or CouchDB,
and something entirely unrelated
orMongoDB.
Unortunately most benchmarks published
online have similar crucial aws in the
methodology, and since many people
make decisions based on this inormation,
sotware vendors are orced to modiy the
deault conguration o their products to
look good on these benchmarks. There is
no easy solution perorming proper
benchmarks is very error-prone, time
consuming work. Its good to be very
skeptical about benchmarks that show a
large perormance diference but dontcareully discuss the methodology and
potential pitalls. As Brad Pitts character
says at the end oInglourious Basterds,
Long story short, we hear a story
too good to be true, it aint.
This blog post originally appeared
on rethinkdb.com/blogin July 2010.
RethinkDB Team
@RethinkDb
The RethinkDB team is working on a scalable, open-source, distributed document
database system that eatures a pleasant query language, parallelized architecture, and
table joins. You can learn more atrethinkdb.com.
http://www.futurealoof.com/posts/mongodb-performance-durability.htmlhttp://www.kchodorow.com/blog/2009/06/29/couchdb-vs-mongodb-benchmark/http://localhost/var/www/apps/conversion/tmp/scratch_10/rethinkdb.com/blog%20http://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/RethinkDbhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/RethinkDbhttp://localhost/var/www/apps/conversion/tmp/scratch_10/rethinkdb.comhttp://localhost/var/www/apps/conversion/tmp/scratch_10/rethinkdb.comhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/RethinkDbhttp://localhost/var/www/apps/conversion/tmp/scratch_10/twitter.com/RethinkDbhttp://localhost/var/www/apps/conversion/tmp/scratch_10/rethinkdb.com/blog%20http://www.kchodorow.com/blog/2009/06/29/couchdb-vs-mongodb-benchmark/http://www.futurealoof.com/posts/mongodb-performance-durability.html -
7/28/2019 DevOps Friday - 12th April 2013
15/15
Sign up at
DevOpsFriDay.COM
or next weeks issue!
http://devopsfriday.com/http://devopsfriday.com/