the service score gamifying operational ... gamifying operational excellence basically, if we can...

Download The Service Score Gamifying Operational ... Gamifying Operational Excellence Basically, if we can fetch

If you can't read please download the document

Post on 06-Jul-2020

6 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Gamifying Operational Excellence

    The Service Score Card

  • 1 The Problem

    3 A Solution tour

    4 The results

    5 Take aways & lessons Learnt & Questions

    2 A Solution idea

    Agenda

  • “If it's not broken, I’ll fix it.”

    From Australia, on loan as

    Staff SRE @ linkedIn

    jobs, companies, recruiter

    & Finder of encoding bugs

    about me Danny ☃ Lawrence

  • “If it's not broken, I’ll fix it.”

    From Australia, on loan as

    Staff SRE @ linkedIn

    jobs, companies, recruiter

    & Finder of encoding bugs

    about me Danny ☃ Lawrence

  • “If it's not broken, I’ll fix it.”

    From Australia, on loan as

    Staff SRE @ linkedIn

    jobs, companies, recruiter

    & Finder of encoding bugs

    about me Danny ☃ Lawrence

  • “If it's not broken, I’ll fix it.”

    From Australia, on loan as

    Staff SRE @ linkedIn

    jobs, companies, recruiter

    & Finder of encoding bugs

    about me Danny ☃ Lawrence

  • “If it's not broken, I’ll fix it.”

    From Australia, on loan as

    Staff SRE @ linkedIn

    jobs, companies, recruiter

    & Finder of encoding bugs

    about me Danny ☃ Lawrence

  • Good news SRECON.

    You passed the ☃ test.

    about me Danny ☃ Lawrence

  • Some terms (before we really get started)

  • Operational Excellence effective and efficient delivery of information, technology, and services required by end users

    that add measurable value.

    10

    Gamifying Operational Excellence

  • Operational Excellence Doing everything required to make sure

    all of your services are as fast and as reliable as possible.

    11

    Gamifying Operational Excellence

  • Gamification application of game-design elements and game

    principles in non-game contexts.

    12

    Gamifying Operational Excellence

  • Some background (LinkedIn SRE crash course)

  • Mostly Java Multitudes of services

    Doing lots of things Service-oriented architecture Everything talks to everything

    My direct team looks after 80+ services We have 200+ SREs

    14

    LinkedIn SRE Crash Course

  • The Problem (What started this whole thing)

  • Problem 1: The GOOD

    & The BAD

    16

    Gamifying Operational Excellence

  • BAD services wake me up

    17

    Gamifying Operational Excellence

  • GOOD services let me sleep

    18

    Gamifying Operational Excellence

  • What makes a GOOD service at LinkedIn is a moving target.

    19

    Gamifying Operational Excellence

  • Technologies and dependencies change

    over time.

    20

    Gamifying Operational Excellence

  • Upgrading dependencies & libraries Java / Jetty / Play / Tomcat

    Correct usage of TLS Switching databases / caches

    Migrate from SVN to GIT Reduce application startup time

    Setup error budgeting True up the number of metrics

    21

    Some examples

  • A GOOD service can turn into a BAD service.

    If you are not checking it

    22

    Gamifying Operational Excellence

  • Unfortunately BAD services

    do not magically turn into

    GOOD services 23

    Gamifying Operational Excellence

  • Problem 2: Knowing what is BAD

    24

    Gamifying Operational Excellence

  • Problem 3: Knowing why it’s BAD

    25

    Gamifying Operational Excellence

  • Problem 4: Tribal knowledge

    about how to get to GOOD

    26

    Gamifying Operational Excellence

  • The only thing we appear to hate more than not having documentation,

    ... Is writing documentation.

    27

    Gamifying Operational Excellence

  • The Problem summary

  • BAD services wake me up Time will cause GOOD to turn BAD

    Hard to know what is BAD Hard to know why is BAD

    Not sure how to fix the BAD

    29

    Gamifying Operational Excellence

  • The Service ScoreCard (A solution)

  • In order determine the health of the services we support,

    we define a list of production requirements.

    31

    Gamifying Operational Excellence

  • Apply a weight to each requirement

    32

    Gamifying Operational Excellence

  • Codify each requirement into a check.

    33

    Gamifying Operational Excellence

  • Execute these checks for each service

    34

    Service Scorecard

  • Tally up the results for service.

    35

    Gamifying Operational Excellence

  • Grade the service from “F” to “A+”

    36

    Gamifying Operational Excellence

  • Add all the services into a highscore system

    37

    Gamifying Operational Excellence

  • Then

    38

    Gamifying Operational Excellence

  • Publish those scores to the company

    39

    Gamifying Operational Excellence

  • This is great, but how do I improve the score?

    How can I add X check into the system.

    40

    Gamifying Operational Excellence

  • What makes a check?

  • checks are one type of plugin.

    fetch plugins gather data check plugins check the data.

    42

    Gamifying Operational Excellence

  • We use the fetch plugin to gather remote data from:

    SVN, GIT, Configuration DBs, host databases, monitoring systems, build systems, deployment systems.

    43

    Gamifying Operational Excellence

  • Basically, if we can fetch it,

    then we do so.

    44

    Gamifying Operational Excellence

  • We build a giant context object.

    45

    Gamifying Operational Excellence

  • The check plugin will look at our context object.

    46

    Gamifying Operational Excellence

  • All plugins are small python scripts, where small is 10~30 LOC

    47

    Gamifying Operational Excellence

  • Simply return 2 or 3 things.

    state*: True, False, None or 0.0 - 1.0 message*: short string data: python dict of interesting things.

    48

    Gamifying Operational Excellence

  • Example fetch plugin

  • @ssc.tags(“ownership”) def fetch_ownership(service_name): “Fetch all the ownership data of a service”

    o = r.get(“http://owners/” + service_name)

    return True, “gathered data”, o.json()

    50

    http://owners/

  • @ssc.tags(“ownership”) def fetch_ownership(service_name): “Fetch all the ownership data of a service”

    o = r.get(“http://owners/” + service_name)

    return True, “gathered data”, o.json()

    51

    http://owners/

  • @ssc.tags(“ownership”) def fetch_ownership(service_name): “Fetch all the ownership data of a service”

    o = r.get(“http://owners/” + service_name)

    return True, “gathered data”, o.json()

    52

    http://owners/

  • @ssc.tags(“ownership”) def fetch_ownership(service_name): “Fetch all the ownership data of a service”

    o = r.get(“http://owners/” + service_name)

    return True, “gathered data”, o.json()

    53

    http://owners/

  • @ssc.tags(“ownership”) def fetch_ownership(service_name): “Fetch all the ownership data of a service”

    o = r.get(“http://owners/” + service_name)

    return True, “gathered owner data”, o.json()

    54

    http://owners/

  • Example check plugin

  • @ssc.weight(5) @ssc.tags(‘ownership’) @ssc.wiki(‘http://wiki/ssc_eng_owner’) def check_eng_team(ctx): “ensure ENG ownership of a service”

    if ctx.ownership.eng_team: return True, ctx.ownership.eng_team return False, “missing eng_team”

    56

  • @ssc.weight(5) @ssc.tags(‘ownership’) @ssc.wiki(‘http://wiki/ssc_eng_owner’) def check_eng_team(ctx): “ensure ENG ownership of a service”

    if ctx.ownership.eng_team: return True, ctx.ownership.eng_team return False, “missing eng_team”

    57

  • @ssc.weight(5) @ssc.tags(‘ownership’) @ssc.wiki(‘http://wiki/ssc_eng_owner’) def check_eng_team(ctx): “ensure ENG ownership of a service”

    if ctx.ownership.eng_team: return True, ctx.ownership.eng_team return False, “missing eng_team”

    58

  • @ssc.weight(5) @ssc.tags(‘ownership’) @ssc.wiki(‘http://wiki/ssc_eng_owner’) def check_eng_team(ctx): “ensure ENG ownership of a service”

    if ctx.ownership.eng_team: return True, ctx.ownership.eng_team return False, “missing eng_team”

    59

  • @ssc.weight(5) @ss