devops chicago - the game of operations and the operation of games

26
The Game of Operations and The Operation of Games Randy Shoup @randyshoup linkedin.com/in/randyshoup DevOps Chicago Meetup, May 19 2014

Upload: randy-shoup

Post on 08-May-2015

577 views

Category:

Internet


0 download

DESCRIPTION

Operating online games is fun and challenging. Games are some of the spikiest workloads around, and real-time really means *real-time*. Randy shares many of the DevOps techniques he has been putting into practice at KIXEYE, including migrating to the cloud, organizing around services, and focusing on automation. He illustrates his points with war stories from operating large-scale services at Google and eBay. Please see companion video at https://vimeo.com/95841677.

TRANSCRIPT

Page 1: DevOps Chicago - The Game Of Operations and the Operation of Games

The Game of Operationsand

The Operation of Games

Randy Shoup @randyshoup

linkedin.com/in/randyshoup

DevOps Chicago Meetup, May 19 2014

Page 2: DevOps Chicago - The Game Of Operations and the Operation of Games

Background

CTO at KIXEYE• Real-time strategy games for web and

mobile

Director of Engineering for Google App Engine• World’s largest Platform-as-a-Service

Chief Engineer at eBay• Multiple generations of eBay’s real-time

search infrastructure

Page 3: DevOps Chicago - The Game Of Operations and the Operation of Games

Real-Time Strategy Games are …

• Real-time• Spiky• Computationally-

intensive• Constantly evolving• Constantly pushing

boundaries

Technically and operationally demanding

Page 4: DevOps Chicago - The Game Of Operations and the Operation of Games

Operating Games: Goals

Player Fun• If players aren’t playing, we don’t have a business• If players aren’t having fun, we don’t have a business for long• Fun includes game mechanics, feature set, quality,

performance

Studio Velocity• 8 *highly independent* game studios• Different tech stacks, tool chains, phases of development

Developer Productivity and Satisfaction• We are a vendor; the studios are our customers• Must be *strictly better* than the alternatives of build, buy,

borrow

Cost Efficiency• More output for less

Page 5: DevOps Chicago - The Game Of Operations and the Operation of Games

The Game of Operations

Cloud• All studios and services moving to AWS• Strong focus on automation

Services• Small, focused teams • Clean, well-defined interface to customers

DevOps• Developers behave like Ops• Ops behaves like Developers

Page 6: DevOps Chicago - The Game Of Operations and the Operation of Games

The Game of Operations

Cloud

Services

DevOps

Page 7: DevOps Chicago - The Game Of Operations and the Operation of Games

Why Cloud? (The Obvious)

Provisioning Speed• Minutes, not weeks• Autoscaling in response to load

Near-Infinite Capacity• No need to predict and plan for growth• No need to defensively overprovision

Pay For What You Use• No “utilization risk” from owning / renting• If it’s not in use, spin it down

Page 8: DevOps Chicago - The Game Of Operations and the Operation of Games

Why Cloud? (The Less Obvious)Instance Optimization Opportunities• Instance shapes to fit most parts of the solution

space (compute-intensive, IO-intensive, etc.)• If the shape does not fit, try another

Service Quality• Amazon and Google know how to run data

centers• Battle-tested and highly automated• World-class networking, both cluster fabric and

external peering

Page 9: DevOps Chicago - The Game Of Operations and the Operation of Games

Why Cloud? (The Fundamentals)Right Side of History• Almost impossible to beat Google / Amazon

buying power or operating efficiencies• 2010s in computing are like 1910s in electric

power• Soon it will be just as common to run your own

data center as it is to run your own electric power generation (!)

Easy and Fun• It Just Works ™• Makes it easy to fall in love with infrastructure

Page 10: DevOps Chicago - The Game Of Operations and the Operation of Games

Autoscaling

Games are very spiky• Very unpredictable• Huge variability between peak and trough• Hits are self-reinforcing

Services and clients have to “flex”• Clients back off in response to latency• Services grow / shrink based on load

Service Cluster == AWS Auto-Scale Group• Scale up or down based on predefined metrics,

thresholds

Page 11: DevOps Chicago - The Game Of Operations and the Operation of Games

Automation Work at KIXEYE

Build / Deploy Pipeline• One button• Puppet -> Packer -> AMI -> Asgard• No-downtime red-black deployment• Futures: canarying, auto-rollback

Manageability• Flume -> ElasticSearch / Kibana for logging• Shinken -> PagerDuty for monitoring and

alerting

Page 12: DevOps Chicago - The Game Of Operations and the Operation of Games

The Game of Operations

Cloud

Services

DevOps

Page 13: DevOps Chicago - The Game Of Operations and the Operation of Games

Service Teams

• Give teams autonomy• Freedom to choose technology,

methodology, working environment• Responsibility for the results of those

choices

• Hold them accountable for *results*• Give a team a goal, not a solution• Let team own the best way to achieve the

goal

Page 14: DevOps Chicago - The Game Of Operations and the Operation of Games

KIXEYE Service Chassis

• Goal: Produce a “chassis” for building scalable game services

• Minimal resources, minimal direction• 3 people x 1 month• Consider building on open source projects

Team exceeded expectations• Co-developed chassis, transport layer, service template,

build pipeline, red-black deployment, etc.• Operability and manageability from the beginning• Heavy use of Netflix open source projects• 15 minutes from no code to running service in AWS (!)• Plan to open-source several parts of this work

Page 15: DevOps Chicago - The Game Of Operations and the Operation of Games

Micro-Services

SimpleWell-defined interfaceSingle-purposeModular and independentSmall teamsAutonomy and responsibility

A

C D E

B

Page 16: DevOps Chicago - The Game Of Operations and the Operation of Games

Transition to Building ServicesCommon Chassis

• Make it trivially easy to build and maintain a service

Define Service Interface (Formally!)• Propose, Discuss, Agree

Prototype Implementation• Simplest thing that could possibly work• Client can integrate with prototype• Implementor can learn what works and what does not

Real Implementation• Throw away the prototype (!)

Rinse and Repeat

Page 17: DevOps Chicago - The Game Of Operations and the Operation of Games

Transition to Service RelationshipsVendor – Customer Relationship

• Friendly and cooperative, but structured• Clear ownership and division of responsibility• Customer can choose to use service or not (!)

Service-Level Agreement (SLA)• Promise of service levels by the service provider• Customer needs to be able to rely on the service, like a

utility

Charging and Cost Allocation• Charge customers for *usage* of the service• Aligns economic incentives of customer and provider• Motivates both sides to optimize

Page 18: DevOps Chicago - The Game Of Operations and the Operation of Games

The Game of Operations

Cloud

Services

DevOps

Page 19: DevOps Chicago - The Game Of Operations and the Operation of Games

Instrumentation and Measurement

Instrument Everything• Machine / instance stats: CPU, memory, I/O• Software infrastructure stats: database, message

queue• Application stats: game client, game server, services

Make It Easy to Do the Right Thing ™• Easy, reliable, low-latency• Auto-tagged and searchable

Why?• Measurement beats intuition every time; my own

intuition is usually wrong • If you need to ssh into a box, instrumentation failed

you

Page 20: DevOps Chicago - The Game Of Operations and the Operation of Games

One Team (!)

• Act as one team across development, product, operations, etc.

• Solve problems instead of blaming and pointing fingers

• Political games are not as fun as real-time strategy games

Page 21: DevOps Chicago - The Game Of Operations and the Operation of Games

Everyone Is Responsible for ProdEveryone’s incentives are aligned

Everyone is strongly motivated to have solid instrumentation and monitoring

Page 22: DevOps Chicago - The Game Of Operations and the Operation of Games

Organization: Learning CultureLearn from mistakes and improve• What did you do -> What did you learn• Take emotion and personalization out of

it

Encourage iteration and velocity• “Failure is not falling down but refusing

to get back up” – Theodore Roosevelt

Page 23: DevOps Chicago - The Game Of Operations and the Operation of Games

Google Blame-Free Post-MortemsPost-mortem After Every Incident• Document exactly what happened• What went right• What went wrong

Open and Honest Discussion• What contributed to the incident?• What could we have done better?Engineers compete to take personal

responsibility (!)

Page 24: DevOps Chicago - The Game Of Operations and the Operation of Games

Transition to DevOps

Organization• Studios make user-visible games• Services provide common endpoints

Training / Retraining• Common bootcamp• Train devs as Ops, Ops as devs

You Build It, You Run It• Transition on-call• Use primary / secondary on-call as

apprenticeship

Page 25: DevOps Chicago - The Game Of Operations and the Operation of Games

Recap: The Game of OperationsCloud

Services

DevOps

Page 26: DevOps Chicago - The Game Of Operations and the Operation of Games

Come Join Us!

KIXEYE is hiring in SF, Seattle, Victoria, Brisbane, Amsterdam

@[email protected]/in/randyshoupslideshare.net/randyshoup