gartner infrastructure and operations summit berlin 2015 - devops journey

Beginning the DevOpsJourney in Real Money

Gaming

Kelly Looney 27.11.2014

DevOps in Real Money GamingContext: 600M € company – Online Sports betting, Online Casino, Poker, other games

Two primary technologies combined via a business merger(turn of the century architecture)

• Sports - .Net/SQL Server• Poker, Casino, and “Platform” – Java/Oracle

• Datacenters in Gibraltar, Vienna, and other points in Europe, now in US• Over 2000 Servers in production • 200 people in Ops and Infrastructure• Development Centers in Vienna, Ukraine, and Hyderabad• Over 700 development team members

2

3

Different faces/rulesFor different markets

Monolithic App with many single points of failure

In 2013…the Challenge

DevTeams focused on Horizontal Components

Totally separate Ops, Maintenance, and Devteams

Clashing cultures from Merger/Locations/Code bases

Up 24/7 with Millions of €/day wagered

In-‐houseBuildDeployMonitoring…

AppDynamics picture of the Beast 4

What we have done and are doing…• Global Agile Transformation – classes, coaches, 96 Scrum teams• Craig Larman, Luke Hohmann (Innovation Games)• DTO (Damon Edwards, Alex Honor) for DevOps principles• Now exploring SAFe

• Several Organization Changes• Components -> Features -> Services• Ops -> LeanOps -> Delivery Units

6

Cultural changes we have encouraged• Old style Developers• Responsibilities: Write code• Focus: Know ONE THING really really well. • Deep expertise = respect• What we want now is Developers that:• Understand our company goals • Understand requirements and tests• Write, build, integrate, and test code incrementally • Can demonstrate and explain working systems • Maintains his/her code in production• Understands operations

Deep expertise is great, but varied knowledge is just as important7

Wow, you want developers to do everything…

• First the right attitude…then• Todays Tools and Processes:

1. Agile provides continuous “customer” access2. Distributed versioning (typically Git) puts full source control into individual developers hands3. Continuous Integration isolates mistakes4. Jenkins-Vagrant-Puppet-Chef-Saltstack pipelines make infrastructure and deployment mostly

automatic regardless of complexity• Deploy to Test, UAT, Staging, Production5. Monitoring lets you see and assess your running service

How is that possible?

8

What we have done and are doing…tech• Tool changes• SVN->Git, In house depoy -> Jenkins/Team City, Puppet,Chef, Rundeck• Bare Metal -> VMWare -> Now headed to Docker/containers• Monitoring…AppDynamics – more to come

• Architectural Principles• Less centralized, fewer failure points• Code to create a server is the asset, not the server• Throw cheap machines, not faster CPUs or bigger DBs at scaling problems

• Use RDBs when needed otherwise avoid

9

Containers are changing hosting• Virtualization efficiency and cost savings are obvious

• The most interesting issue is the separation of concerns presented• “developer-land” vs infrastructure

10

What to do about quality?

• We pulled all sorts of people together• Ops , Dev, CS, Business, Partners…

• “What do you think we can we do to improve overall system quality?”• #1 Answer: We need comprehensive monitoring

• Our system is so complex and so opaque we can’t really tell what is specifically wrong.• Reworking our millions of lines of code to properly and consistently log will never happen…

• This lead us to evaluate many different monitoring approaches and products• We settled on AppDynamics, reasons:

• Advanced UI, very flexible• One application to replace a variety of other solutions• Aggregation of data was a huge cost saver

• #2 Quality issue: Testing Environment stability and viability• Expensive, not really “production-like” and not highly available

• Too elaborate for early testing and not close enough for late testing• Forced to mix tests which often polluted one another• Infrastructure just an incredible blocker, no private or public cloud

First Steps: Workshops at each main development site

11

The Difference Monitoring has made…1. Like a giant debugger for production issues

• Peer into what were before opaque code bases• Where are the stress points? Also surface the really dumb stuff.

• Identify intermittent issues that were hard to identify before• “Working for me…”

2. Better resource planning• We had lots of “over-solved” problems before• How do things change during spikes in traffic

3. Rollout actually helped us identify services that needed refactoring• If the overhead of mature monitoring breaks your service…

4. Developers starting to use AppDynamics to assess new designs• It has uncovered a few things were were happy we did not deploy!

5. Gets the whole organization in touch with operations• A huge DevOps goal realized…

13

Posted all around the organization

14

Automating Test

• It’s not “How many automated tests do I have?”• We could have easily run days worth or tests whenever we wanted

• It’s “I have the right tests to quickly decide if I can move forward”

• Also BTW “We run Jenkins to do a build every night”• Does != Continuous Integration…

• Can you create a viable test environment, use it, then throw it away?(before it pollutes other tests…)

15

What DevOps and CD mean for the organization• The whole idea of holding off changes to retain stability gets turned on its head• Change all the time and stay stable!

• Changes get smaller and smaller, but are constantly being deployed• With small changes integration issues become fairly simple

• Environments must proliferate along with associated infrastructure• Ideally you need a new test environment to test every change – Create/Destroy

• Are your environments captured as code?• Use Cloud services here, even if you don’t want to for production

16

gartner infrastructure and operations summit berlin 2015 - devops journey

Technology