building trust within the organization, first steps towards devops

28
Building trust within the organization, first steps towards DevOps Guido Serra, txtr

Upload: guido-serra

Post on 03-Jul-2015

3.801 views

Category:

Technology


1 download

DESCRIPTION

presented at Berlin DevOps meetup https://www.xing.com/events/berlin-devops-11q4-2-meetup-832934

TRANSCRIPT

Page 1: Building trust within the organization, first steps towards DevOps

Building trust within the organization, first steps towards

DevOps

Guido Serra, txtr

Page 2: Building trust within the organization, first steps towards DevOps

What’s the role of a DevOp(s)?

• Deliver

• Be bridge of trust between DEVs and SysOPs

• Stop the “throw the ball over the fence” game

• Mediate

• Drive non-functional requirements

… DevOp or DevOps, talking of one?

Page 3: Building trust within the organization, first steps towards DevOps

Introduce a DevOp(s)

• In ‘txtr, starting as a QA Manager, specialisedon backend systems, seems to have worked

• Other organizations tends to call it Site Reliability Engineer / Site Reliability Operation

• But… QA != Testing, not strictly at least

– Testing should be only a subset of QA, but that is not how it is normally perceived

– Non-functional requirements did not seem to fit in

Page 4: Building trust within the organization, first steps towards DevOps

Non-functional requirements?

• Functional requirements == features

• Non-functional requirements == everything that OPS would need to run the service, or even things that Product Owners would want but has not thought of at the design time– Logging

• Which kind of informations?

• How?

– Health checks / Load Balancer required URL

– Live sales report / Dashboard / Charting

Page 5: Building trust within the organization, first steps towards DevOps

Steps that worked so far

• Listen …to OPS, to PMs, to QA, to R&D

• See how the people have solved their specific needs trying to gather informations

• Match all the tools that have been built

• Try to gather the essence of those tools, and come up with non-functional requirements

• Discuss those with the R&D organization and push them at Product level to be prioritized over features

Page 6: Building trust within the organization, first steps towards DevOps

TRUST

Means…

• Not having to duplicate work

– wrongly testing the backend to see if it is answering

– or testing to measure the response times

– or creating tests again, when there are plenty of them that are simply not shared and/or broadly understood

Page 7: Building trust within the organization, first steps towards DevOps

The answer is 42?

…no, the answer is DATA!

• Creating a single point of data collection and graphing, people are gaining trust in the backend

• Logs need to be shared too

• Tests needs to be commonly understood

Page 8: Building trust within the organization, first steps towards DevOps

SHARE LOGS WITH EVERYONELogging

Page 9: Building trust within the organization, first steps towards DevOps

Tools

• Logging– Slf4j > Log4j / JUL > GELF > GrayLog2

• Logging to syslog from a Java based backend, is pretty bad. The stacktrace become very hard to be fetched and reported in a ticket. Instead, one link and a screenshot, or a cut&paste of a complete stacktracefrom a web interface is much more easy to be digested

• GELF is a notification format, encapsulating the full stacktrace as a message

• GrayLog2 is a ruby/MongoDB FIFO queue with a nice web interface, and an alerting email system

Page 10: Building trust within the organization, first steps towards DevOps

Why?

• Slf4j– It is an abstraction layer on logging facilities

• I’ll not explain why an “abstraction layer” is good

• Log4j or JUL, at your choice– They are the most commonly used

• Means: their code is maintained

• GELF– It keeps a full stacktrace in a single message. There is no need of

reconstructing it from syslog, spread on multiple lines and with additional garbage/timestamps

• GrayLog2– We have an in-house developer, and it is working pretty well– Has threshold based alerting per streams of events (regexp)

Page 11: Building trust within the organization, first steps towards DevOps
Page 12: Building trust within the organization, first steps towards DevOps
Page 13: Building trust within the organization, first steps towards DevOps

Results seen so far

• 1st level support team is gaining trust in the application. – Logs are getting more and more readable

– Events can be correlated much more easily

• 2nd level support (OPS) can set thresholds of alerts and react promptly, having alerts tight to real traffic data and not “one time probes”

• I have a better feeling of the trend of issues in production, and I don’t have to dig for logs

Page 14: Building trust within the organization, first steps towards DevOps

PRODUCTION PERFORMANCEInstrumented metrics

Page 15: Building trust within the organization, first steps towards DevOps

Tools

• Instrumented metrics

– JMX > Jolokia > JSON > Graphite

• MIN / MAX / AVG response time of each API

• Worst response times with related API parameters

• Success / failure counters

• All the above aggregated over the last 5 / 15 minutes, 1 hour, 24 hours

• Plus all the standard exposed JConsole / JMX infos

Page 16: Building trust within the organization, first steps towards DevOps

Why?

• JMX– It is built in in Java, and it is non-invasive

• R&D loves it, cause it does not need an invasive agent as many profiling agents that are normally used in such cases. Standard profiling agents tend to interfere with the application and decrease the overall performance.

– It is a standard, so there are many tools that plug into it natively

• Jolokia– It is a standard tool that plugs into JMX and expose it

as JSON encoded format

Page 17: Building trust within the organization, first steps towards DevOps

Why?

• Graphite

– It can correlate data from many sources

– Gives me the freedom of structuring graphs as I want, directly from the web interface

• This is a definitive WIN over Munin or Cacti

– It lets me select specific timeframes

• In case of outage investigation. Thing which is not possible with Munin

– Can create dashboards

Page 18: Building trust within the organization, first steps towards DevOps

Data are in transactions “per 5 minutes” in this graph…you can see this specific service is currently being used

Page 19: Building trust within the organization, first steps towards DevOps

100 transactions per seconduhmm… at 7a.m., ok 11a.m. in Indiasomeone is testing…

Page 20: Building trust within the organization, first steps towards DevOps

Results seen so far

• No need of load and performance testing– Apart of specific cases, to try to reproduce the

issue to let DEVs work on it.

– Producing a proper load test is problematic, and can bring to false assumptions about the product. Having the possibility to watch what the business logic is doing in production is the best load test.

• DEVs are proactively watching and fixing performance issues on their own. The overall product gets better and better.

Page 21: Building trust within the organization, first steps towards DevOps

SHARE TESTS AND RESULTSTesting

Page 22: Building trust within the organization, first steps towards DevOps

Tools

• Testing

– BDD / Cucumber-Nagios executed by Jenkins

• Cover all the fast HTTP action via Watir

• API calls via JsonRPC or Soap4Rr

• Javascript based UI via Selenium / Capybara

• These tests are actually very valuable at deployment time, since there is no need of manual testing. All is in the hand of whom follows the deployment.

Page 23: Building trust within the organization, first steps towards DevOps

Why?

• BDD– Not everyone wants to read your code– Not everyone is a coder– You don’t want to have to explain your test again and

again and again, and you hate documenting

• Cucumber-Nagios / Ruby– It is off-the-shelf, it works.– It generates standard JUnit XML report

• Means: it directly integrates with Jenkins ( ex Hudson )

– It generates an awesome HTML report– It can be extended pretty easily

Page 24: Building trust within the organization, first steps towards DevOps

Why?

• Watir– It is the default HTTP client in Cucumber-Nagios

• BUT: it has tons of bugs… I have a long backlog to fix

– It is fast

• Soap4r– Pretty easy SOAP ruby gem/library

• JsonRPC– Very simple and basic JSON RPC gem/library

• BUT: it does not support proxy settings

Page 25: Building trust within the organization, first steps towards DevOps

Why?

• Selenium

– Cause it is the only one?

– It supports Javascript

– It supports clustering of testing nodes

– It is supposed to be easy to integrate with Cucumber (it is NOT …I’m working on it)

Page 26: Building trust within the organization, first steps towards DevOps
Page 27: Building trust within the organization, first steps towards DevOps

Upcoming…

• Health checks (normally used for load balancing purposes) are based on business logic historical data from within the instrumented metrics

• Continuous integration

– Configuration management

• Data mining

Page 28: Building trust within the organization, first steps towards DevOps

[email protected]

http://slidesha.re/rVzd8F