chicago devops meetup

Upload: henderson

Post on 02-Mar-2016

43 views

Category:

Documents


1 download

DESCRIPTION

Chicago meetup notes - devops group

TRANSCRIPT

  • 7/18/2019 Chicago DevOps Meetup

    1/110

    How Netflix Delivers

    Software

    July 8th, 2014

    Email: jedberg@{gmail,netflix}.comTwitter: @jedbergWeb: www.jedberg.netFacebook: facebook.com/jedbergLinkedin: www.linkedin.com/in/jedberg

    http://www.linkedin.com/in/jedberghttp://www.jedberg.net/mailto:[email protected]
  • 7/18/2019 Chicago DevOps Meetup

    2/110

    When your software fails...

  • 7/18/2019 Chicago DevOps Meetup

    3/110

    will your system survive?

  • 7/18/2019 Chicago DevOps Meetup

    4/110

  • 7/18/2019 Chicago DevOps Meetup

    5/110

    The Netflix way

    Fully automated build tools

    to test and make packages

    Fully automated machineimage bakery

    Fully automated imagedeployment

  • 7/18/2019 Chicago DevOps Meetup

    6/110

    Everything is built for three Independent teams responsible

    for both Dev and Ops Redundancy through multi-

    region deployment

    The Netflix way

  • 7/18/2019 Chicago DevOps Meetup

    7/110

    Philosophy

  • 7/18/2019 Chicago DevOps Meetup

    8/110

    We hire responsible adultsand keep rules and policiesto a minimum

    Developers can change anycode in production at any

    time And things dont break

    (usually)

    Freedom and

    Responsibility

  • 7/18/2019 Chicago DevOps Meetup

    9/110

    Automate all the things!

    http://hyperboleandahalf.blogspot.com/2010/06/this-is-why-ill-never-be-adult.html

    http://hyperboleandahalf.blogspot.com/2010/06/this-is-why-ill-never-be-adult.html
  • 7/18/2019 Chicago DevOps Meetup

    10/110

    Application startup Configuration Code deployment

    Systemde lo ment

    Automate all the things!

  • 7/18/2019 Chicago DevOps Meetup

    11/110

    Standard base image Tools to manage all

    the systems

    Reduce errorsthroughreproducibility

    Automation

  • 7/18/2019 Chicago DevOps Meetup

    12/110

    Shared state should

    be stored in ashared service

    Data on an instance

    should be replicatedto other instances

  • 7/18/2019 Chicago DevOps Meetup

    13/110

    Build for three

    We hold a boot camp for newengineers to teach them how to

    build for a highly distributed

    environment.

  • 7/18/2019 Chicago DevOps Meetup

    14/110

    Build for three

    We hold a boot camp for newengineers to teach them how to

    build for a highly distributedenvironment.

  • 7/18/2019 Chicago DevOps Meetup

    15/110

  • 7/18/2019 Chicago DevOps Meetup

    16/110

    !"#$%

    '()*+,-%.,"*(/$0()"*

    1*+$*% 2,%. 3*4"

    !"#$%

    !%5(6(5(

    7$8$/(.

    !"#$%, '%#$%9,

    :;< =%,5

    1*+$*%

    Discovery

    API

    Streaming

    API

  • 7/18/2019 Chicago DevOps Meetup

    17/110

    !"#$%

    '()*+,

    -%.,"*(/$0()"*

    1*+$*% 2,%. 3*4"

    !"#$%

    !%5(6(5(

    7$8$/(.

    !"#$%,

    '%#$%9,

    :;< =%,5

    1*+$*%

    Discovery

    API

    Streaming

    API

    Content

    EncodingCDN

    Management

    QOS

    LoggingDRM

    OpenConnect

    Edge Locations

    Browse

    Play

    Watch

  • 7/18/2019 Chicago DevOps Meetup

    18/110

    Services are built by different

    teams who work together tofigure out what each servicewill provide.

    The service owner publishesan API that anyone can use.

    Highly aligned, loosely

    coupled

  • 7/18/2019 Chicago DevOps Meetup

    19/110

    Easier auto-scaling Easier capacity planning Identify problematic code-paths

    more easily Narrow in the effects of a change More efficient local caching

    Advantages to a Service

    Oriented Architecture

  • 7/18/2019 Chicago DevOps Meetup

    20/110

    Developers deploy whenthey want

    They also manage their owncapacity and autoscaling

    And fix anything that breaksat 4am!

    Freedom and

    Responsibility

  • 7/18/2019 Chicago DevOps Meetup

    21/110

    All systems

    choices assumesome part will fail

    at some point.

  • 7/18/2019 Chicago DevOps Meetup

    22/110

    Simulate things

    that go wrong

    Find things thatare different

    The Monkey Theory

  • 7/18/2019 Chicago DevOps Meetup

    23/110

    Execution

  • 7/18/2019 Chicago DevOps Meetup

    24/110

  • 7/18/2019 Chicago DevOps Meetup

    25/110

    AWS

    Netflix OSS

    Netflix Application Code

  • 7/18/2019 Chicago DevOps Meetup

    26/110

    AWS

    Netflix OSS

    YOUR Application Code

  • 7/18/2019 Chicago DevOps Meetup

    27/110

    Instances Machine Images Elastic IPs

    Load Balancers

    Security groups / Autoscaling

    What AWS Provides

    AWS

  • 7/18/2019 Chicago DevOps Meetup

    28/110

    AWS

    Netflix OSS

    YOUR Application Code

  • 7/18/2019 Chicago DevOps Meetup

    29/110

    Service OrientedArchitecture HTTP/Rest interfaces

    between services

    Netflix built a global PaaS

    Netflix OSS

  • 7/18/2019 Chicago DevOps Meetup

    30/110

    Supports all regions and zones Multiple accounts Cross region/account replication Internationalized, localized and GeoIP routed Advanced key management Autoscaling with 1000s of instances Monitoring and alerting on millions of metrics

    Netflix PaaS featuresNetflix OSS

  • 7/18/2019 Chicago DevOps Meetup

    31/110

  • 7/18/2019 Chicago DevOps Meetup

    32/110

    Open Source at Netflix

  • 7/18/2019 Chicago DevOps Meetup

    33/110

    Netflix OSS

  • 7/18/2019 Chicago DevOps Meetup

    34/110

  • 7/18/2019 Chicago DevOps Meetup

    35/110

  • 7/18/2019 Chicago DevOps Meetup

    36/110

  • 7/18/2019 Chicago DevOps Meetup

    37/110

    Simulate things

    that go wrong Find things that

    are different

    The Monkey Theory

  • 7/18/2019 Chicago DevOps Meetup

    38/110

  • 7/18/2019 Chicago DevOps Meetup

    39/110

    Netflix OSS

  • 7/18/2019 Chicago DevOps Meetup

    40/110

  • 7/18/2019 Chicago DevOps Meetup

    41/110

    Blueprint for the rest ofthe platform libraries

    Pluggable architecture

  • 7/18/2019 Chicago DevOps Meetup

    42/110

  • 7/18/2019 Chicago DevOps Meetup

    43/110

    On instance software load balancer Zone aware / Zone affinity

    Handles retry logic

  • 7/18/2019 Chicago DevOps Meetup

    44/110

  • 7/18/2019 Chicago DevOps Meetup

    45/110

    Global variables Support for staged rollout Feature flags

  • 7/18/2019 Chicago DevOps Meetup

    46/110

    Netflix OSS

  • 7/18/2019 Chicago DevOps Meetup

    47/110

  • 7/18/2019 Chicago DevOps Meetup

    48/110

    Application to instance mapping

    Heartbeat to keep track of health

  • 7/18/2019 Chicago DevOps Meetup

    49/110

  • 7/18/2019 Chicago DevOps Meetup

    50/110

    DQ Transport Routing

    Suro

    etc

    Eventbus

    Druid

    N fl OSS

  • 7/18/2019 Chicago DevOps Meetup

    51/110

    Netflix OSS

  • 7/18/2019 Chicago DevOps Meetup

    52/110

  • 7/18/2019 Chicago DevOps Meetup

    53/110

    Why Bake?

    Generic AMI Instance

    Traditional:launch OSinstall packagesinstall app

    Netflix:launch OS+app

    App AMI Instance

  • 7/18/2019 Chicago DevOps Meetup

    54/110

    Getting Baked

    Perforce / Git

    libraries

    source

    Ant targets

    Ivy

    Groovy all over

    app bundles

    Jenkins

    sync

    resolve

    buildcompile report

    publishtest

    Artifactory

    snapshot / release

    libraries / apps

  • 7/18/2019 Chicago DevOps Meetup

    55/110

    Base

    ImageBaking

    Yum / Apt

    Linux: CentOS, Fedora, Ubuntu

    RPMs: Apache, Java...

    ec2 slave instances

    S3 / EBS

    foundationAMI

    base

    AMI

    Bakery

    mount

    install

    Ready

    for

    app

    bake

    snapshot

    AWS

    A

  • 7/18/2019 Chicago DevOps Meetup

    56/110

    App

    Image

    Baking

    Jenkins / Yum /

    Artifactory

    Linux, Apache, Java, Tomcat

    AWS

    app bundle

    ec2 slave instances

    S3 / EBS

    base AMI

    app

    AMI

    Bakery

    mount

    install

    Ready

    to launch!

    snapshot

  • 7/18/2019 Chicago DevOps Meetup

    57/110

    app

    AMI Linux Base AMI (CentOS or Ubuntu)

    Java

    Tomcat

    OptionalApache

    MonitoringLog Rotation

    to S3

    monitoring

    GC andthread dump

    logging

    Application war file, base

    servlet, platform, interfacejars for dependentservices

    Healthcheck, status

    servelets, JMX interface,Servo autoscale

  • 7/18/2019 Chicago DevOps Meetup

    58/110

    Application war file

  • 7/18/2019 Chicago DevOps Meetup

    59/110

    Linux Base AMI (CentOS or Ubuntu)

    Java

    JBoss

    OptionalApache

    MonitoringLog Rotation

    to S3

    monitoring

    GC andthread dump

    logging

    Application war file, base

    servlet, platform, interfacejars for dependentservices

    Healthcheck, status

    servelets, JMX interface,Servo autoscale

  • 7/18/2019 Chicago DevOps Meetup

    60/110

    Linux Base AMI (CentOS or Ubuntu)

    Python

    Bottle

    OptionalApache

    MonitoringLog Rotation

    to S3

    monitoring

    logging

    Application file, base

    server, platform, interfacelibs for dependent services

    Netflix OSS

  • 7/18/2019 Chicago DevOps Meetup

    61/110

    Netflix OSS

  • 7/18/2019 Chicago DevOps Meetup

    62/110

  • 7/18/2019 Chicago DevOps Meetup

    63/110

  • 7/18/2019 Chicago DevOps Meetup

    64/110

  • 7/18/2019 Chicago DevOps Meetup

    65/110

    Deploying Code; Step 1

  • 7/18/2019 Chicago DevOps Meetup

    66/110

  • 7/18/2019 Chicago DevOps Meetup

    67/110

  • 7/18/2019 Chicago DevOps Meetup

    68/110

    Auto Scaling

    Group

    LaunchConfiguration

    Security

    Group

    Amazon Machine

    Image

    Instances

    Load

    Balancer

  • 7/18/2019 Chicago DevOps Meetup

    69/110

  • 7/18/2019 Chicago DevOps Meetup

    70/110

    Netflix has moved

    the granularityfrom the instance

    to the cluster

  • 7/18/2019 Chicago DevOps Meetup

    71/110

  • 7/18/2019 Chicago DevOps Meetup

    72/110

    Data is the most

    important asset Netflixhas. Its what differentiatesus from our competitors.

    Netflix OSS

  • 7/18/2019 Chicago DevOps Meetup

    73/110

  • 7/18/2019 Chicago DevOps Meetup

    74/110

    EVCache

    Wrapper on top of memcached Automatically replicates writes to

    multiple regions

    Pulls cache data intelligently via zoneaffinity

  • 7/18/2019 Chicago DevOps Meetup

    75/110

    Cassandra

  • 7/18/2019 Chicago DevOps Meetup

    76/110

    Availability over consistency Writes over reads We know Java Open source + support

    Why Cassandra?

  • 7/18/2019 Chicago DevOps Meetup

    77/110

    Priam Zero touch auto-config State management Token assignment Node replacement

    Backup/restore to/from S3

    Using Cassandra at Netflix

    Astyanax

    OO abstractionto Cassandra Multi-regionsupport

    Cassandra Architecture

  • 7/18/2019 Chicago DevOps Meetup

    78/110

    Cassandra Architecture

  • 7/18/2019 Chicago DevOps Meetup

    79/110

    Going Multi-region

  • 7/18/2019 Chicago DevOps Meetup

    80/110

    100% uptime is theoreticallypossible.

    You have to replicate your data This will cost money

    Leveraging Multi-region

  • 7/18/2019 Chicago DevOps Meetup

    81/110

  • 7/18/2019 Chicago DevOps Meetup

    82/110

  • 7/18/2019 Chicago DevOps Meetup

    83/110

    1 2

  • 7/18/2019 Chicago DevOps Meetup

    84/110

    us-east-1 us-west-2 etc

    eu-west-1

  • 7/18/2019 Chicago DevOps Meetup

    85/110

    1 2

  • 7/18/2019 Chicago DevOps Meetup

    86/110

    us-east-1 us-west-2 etc

    eu-west-1

  • 7/18/2019 Chicago DevOps Meetup

    87/110

    Whats going

    on?!

    Alert Systems

  • 7/18/2019 Chicago DevOps Meetup

    88/110

    Atlas

    alerting

    api

    api

    Central

    EventGateway

    PagingService

    AmazonSES

    COREAgent

    OtherTeamsAgent

    COREAgent

    Alert Systems

    Central

  • 7/18/2019 Chicago DevOps Meetup

    89/110

    Event

    Gateway

    Parse raw alerts, match application to owner Add image captures and links to related

    graphs for easy mobile use Send to the right service based on priority Register the event in Chronos, the timeline

    application Correlate low priority alerts and generate

    new high priority alerts

  • 7/18/2019 Chicago DevOps Meetup

    90/110

  • 7/18/2019 Chicago DevOps Meetup

    91/110

    Metrics in Production 796B Daily metric

    points Peaks at 1.4B /

    min

    50% daily metricchurn

  • 7/18/2019 Chicago DevOps Meetup

    92/110

    What is a metric?com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

  • 7/18/2019 Chicago DevOps Meetup

    93/110

    How we built it Built our own big data

    system Based on S3 and EMR Less copies, lower

    resolution, and slowerspeed retrieval based onage of data

  • 7/18/2019 Chicago DevOps Meetup

    94/110

    Self Serve is the Key Developers choose

    what metrics tosubmit

    What graphs theyput on their

    dashboards What to alert on

    E l Al C fi

  • 7/18/2019 Chicago DevOps Meetup

    95/110

    Example Alert Config

  • 7/18/2019 Chicago DevOps Meetup

    96/110

    Atlas

  • 7/18/2019 Chicago DevOps Meetup

    97/110

    When something breaks..

  • 7/18/2019 Chicago DevOps Meetup

    98/110

  • 7/18/2019 Chicago DevOps Meetup

    99/110

  • 7/18/2019 Chicago DevOps Meetup

    100/110

  • 7/18/2019 Chicago DevOps Meetup

    101/110

    Breakdown of an outage

  • 7/18/2019 Chicago DevOps Meetup

    102/110

    Breakdown of an outage

  • 7/18/2019 Chicago DevOps Meetup

    103/110

    Change control, the good Tells you what changed Tells you whats about tochange Great for coordination

    when one change gatesanother change

  • 7/18/2019 Chicago DevOps Meetup

    104/110

    Change control, the bad Its manual It expresses intent, notreality It forces you to

    serialize your changesto an extent

  • 7/18/2019 Chicago DevOps Meetup

    105/110

    Breakdown of an outage

  • 7/18/2019 Chicago DevOps Meetup

    106/110

    J t i k i d

  • 7/18/2019 Chicago DevOps Meetup

    107/110

    (Some of) Netflix is open source:

    https://netflix.github.io

    Just a quick reminder...

    N tfli i hi i !

    https://netflix.github.io/
  • 7/18/2019 Chicago DevOps Meetup

    108/110

    Netflix is hiring!

    If you like what you see here,feel free to reach out!

    Q ti ?

  • 7/18/2019 Chicago DevOps Meetup

    109/110

    Questions?

    G tti i t h

  • 7/18/2019 Chicago DevOps Meetup

    110/110

    Getting in touch

    Email: jedberg@{gmail,netflix}.comTwitter: @jedbergWeb: www.jedberg.netFacebook: facebook.com/jedbergLinkedin: wwwlinkedin com/in/jedberg

    http://www.linkedin.com/in/jedberghttp://www.jedberg.net/mailto:[email protected]