continuous integration with amazon ecs and docker

69
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tim Secor - Manager, Developer Productivity 8/11/2016 Continuous Integration with ECS and Docker

Upload: amazon-web-services

Post on 16-Apr-2017

776 views

Category:

Technology


2 download

TRANSCRIPT

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Tim Secor - Manager, Developer Productivity

8/11/2016

Continuous Integration

with ECS and Docker

Topics

• Who is Okta

• Okta Engineering—How Do We work, how do we ship

our code?

• The Challenge of the Developer Productivity Team

• A CI System with Amazon EC2 Container Service and

Docker

Okta: Connect Everything

• Connects all users, devices,

applications, and organizations

• SSO, Adaptive MFA,

Provisioning, Universal Directory,

Mobility

• The broadest and deepest

application network

Leader: Okta

Magic Quadrant

Leader: Okta

Forrester Wave

What We Do

We believe that connecting

everything will make organizations

more productive and more secure.

What We BelieveWe Make Customers

Successful

© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved.

Millions of people use Okta every dayMillions of people use Okta every day

© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential 5

Thousands of enterprises use Okta toconnect to Adobe’s Creative Cloud

[email protected]

© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential 6

Thousands of Enterprise Customers

Ed, Gov,Non-Profit

Services Media ConsumerTechnology Manufacturing, Energy

FinanceCloudHealth

© Okta and/or its affiliates. All rights reserved. Okta Confidential 7© Okta and/or its affiliates. All rights reserved. Okta Confidential 7

Okta Application Network

Mobility

Management

Single Sign On Adaptive MFA Provisioning

Universal Directory

Extensible Profiles, Attribute Transformations,

Directory Integration and AD Password Management

Secure SSO for All Your

Web Apps, On-prem

and Cloud, with Flexible

Policy, from Any Device

Contextual Access

Policies,

Modern Factors,

Adaptive Authentication,

Integrations for Apps

and VPNs

Lifecycle Management,

Cloud & On-prem App

Integration, Mastering

from Apps, Directory

Provisioning, Rules,

Workflow, Reporting

Tight User Identity

Integration, Device

Based Contextual

Access,

Light-weight

Management

Okta IT & Platform products

© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential 8

The most reliable IDaaS available

Never taken offline for upgrades

Redundant and scalable

A B C A B C

DC2 DC1

okta.com/trust

A Platform Architecture For Scale

DATA TIER

A B C LOAD

BALANCERS

APP

SERVERS

© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential 9

Global Datacenters

Engineering

Okta Engineering—How Do We work, how do

we ship our code?

• 200 engineers, split into teams with embedded

specialists

• 1 week sprints, and deploy to production weekly

• Capability to do more than one hotfix per day at

customers’ request or for bugs found in CI or pre-prod

• Every merge to master is a potential release candidate

Okta Engineering—How Do We Test Our Code?

• Every topic branch goes through the same amount of

vigor in testing as release candidate.

• Passing automated tests is enforced at commit time.

• Largest repo: 30K tests, takes 60 minutes (22 parallel

runs)

• Smallest repo: 100 tests, 5 minutes

• The Developer Productivity team is responsible for

supporting engineering.

Challenge of Developer Productivity Team

• Developer experience

• Quality

• Cost

• Cloud First

Challenge of Developer Productivity Team

• Developer experience

• Quality

• Cost

• Cloud First

Developers expect fast turn-

around time and reliable results.

Challenge of Developer Productivity Team

• Developer experience

• Quality

• Cost

• Cloud First

We need to run all the tests

required to guarantee quality.

Challenge of Developer Productivity Team

• Developer experience

• Quality

• Cost

• Cloud First

We need to run an

infrastructure which is as cost-

effective as possible

Challenge of Developer Productivity Team

• Developer experience

• Quality

• Cost

• Cloud First

We aim to use cloud services

first, wherever possible

Problems

CI using Open Source, Monolithic Applications

Vision

Vision

• Clean testing environments

• Dynamic worker scaling

• Spot instances for cost

• Versioned Testing

• Improved queuing system

• Less Infrastructure

Flakiness

• The correct privileges, to

maintain security

Vision

• Clean testing

environment

• Dynamic worker scaling

• Spot instances for cost

• Versioned Testing

• Improved queuing system

• Less Infrastructure

Flakiness

• The correct privileges, to

maintain security

Isolate test environments from

others, parallel and serial runs

Vision

• Clean testing environments

• Dynamic worker scaling

• Spot instances for cost

• Versioned Testing

• Improved queuing system

• Less Infrastructure

Flakiness

• The correct privileges, to

maintain security

Workers should survive the

loss of their build server

Worker pool should scale

quickly

Number of workers should not

affect memory footprint of build

server

Vision

• Clean testing environment

• Dynamic worker scaling

• Spot instances for cost

• Versioned Testing

• Improved queuing system

• Less Infrastructure

Flakiness

• The correct privileges, to

maintain security

Run our services for cheaper

rates, as we have many short

lived tasks, and could certainly

handle a few failures

Vision

• Clean testing environment

• Dynamic worker scaling

• Spot instances for cost

• Versioned Testing

• Improved queuing system

• Less Infrastructure

Flakiness

• The correct privileges, to

maintain security

Enable testing of infrastructure

changes in topic branches

Vision

• Clean testing environment

• Dynamic worker scaling

• Spot instances for cost

• Versioned Testing

• Improved queuing system

• Less Infrastructure

Flakiness

• The correct privileges, to

maintain security

Should survive build server

reboots

Shouldn’t be tied to specific

workers or build servers

Centralized

Should have good visibility

Re-queuing of lost tasks

Vision

• Clean testing environment

• Dynamic worker scaling

• Spot instances for cost

• Versioned Testing

• Improved queuing system

• Less Infrastructure

Flakiness

• The correct privileges, to

maintain security

Push testing and creation of

test machines to developers

Vision

• Clean testing environment

• Dynamic worker scaling

• Spot instances for cost

• Versioned Testing

• Improved queuing system

• Less Infrastructure

Flakiness

• The correct privileges, to

maintain security

Launch tasks in secure

environments

Solutions

EC2 Container Service and Docker

• Amazon Web Services + Java app tailored to Okta

process

• Immutable and Disposable build workers—created for

one-time use, destroyed when job is done

• Near ZERO cost on weekends, scales with load

• EC2 Container Service allows us to maximize usage of

EC2 instances

• Same containers for multiple types and numbers of

builds

• Same Machine Image can run multiple docker images

Custom Reporting

Docker

• http://www.docker.com/what-docker#/VM

Docker Update

• Update Dockerfile and our CI system builds the new image,

uploading it to our repository

• Update task definition for cluster updates

Dockerfile

FROM docker.aue1d.saasure.com/okta-base:2.0

MAINTAINER Okta

RUN useradd -d /home/container_user -m -s /bin/bash container_user

# Install wget, tar, hostname

RUN yum install -y wget tar hostname

# Install Java 8

RUN yum install -y java-1.8.0-oracle-1.8.0_31

RUN mkdir -p /opt/sage

RUN mkdir -p /var/log/sage

RUN chown container_user /var/log/sage

ADD conf/* /opt/sage/conf/

ADD core/target/core-*.jar /opt/sage/sage.jar

EXPOSE 8882 8883

USER container_user

CMD java $OKTA_SAGE_JAVA_ARGS -jar /opt/sage/sage.jar server /opt/sage/conf/sage.yml

Docker Security Conventions

Container repository• Only allow containers from internal repository

Security scanning of containers - JFrog Xray

Process monitoring on docker host – cAdvisor from google

Secrets or any form of config NEVER baked in containers

Start from minimal, audited base OS

Run container as non-privileged user w/ user namespaces Docker 1.10+

Monitor alas.aws.amazon.com for critical updates

Docker Source Conventions

3 categories of container definitions

1. “Library” definitions used as the basis for building other images

2. Third-party service definitions e.g. Zookeeper or Elasticsearch

3. Internal service definitions

Repo per internal service

• Dockerfile in same repo => image versioned with code

• Docker compose for running dependent services

• Pegged versions (no builds)

Single repo for library and third-party service definitions

Docker Build Conventions

Integration tests run against code running in container

Build owns creating immutable version and publishing to

artifact server

Strict rules around “FROM” clause

• Must point at internal artifact server

• Must be tagged following SEMVER-SHORT_SHA convention

• Never allow missing or use of “latest” tag for repeatable builds

Docker Build Process

© Okta and/or its affiliates. All rights reserved.

Logging and monitoring

• Logging

• All output streams pipe to STDOUT/STDERR of the running process

• Log forwarding is provided by underlying host

• Log entries contain

• Host

• Container Id

• Image name & version

• Request Id

• Metrics

• Host level, generic container metrics provided by host

• App level metrics published directly to well defined endpoints

Amazon EC2 Container Service Host Management

Userdata installs:

• Slave terminator – T-800

• Base docker images an option

• Credentials – from s3

• Splunk Forwarder – logging

• Cluster target

• Cache – code and libs

Amazon EC2 Container Service

Identity and Access Management separation per service

• Either service per cluster or use new Identity and Access

Management for Elastic Container Service functionality

Sharing the docker daemon to allow running docker within

docker

Pre-fetching large data blobs and making them available

on the hosts is an option

Multiple containers: mysql, redis, kinesilite

Task Definitions

{

"taskDefinitionArn": "arn:aws:ecs:us-east-1:262205085595:task-definition/base-container-box-task:1",

"containerDefinitions": [

{

"memory": 15000,

"essential": true,

"mountPoints": [

{

"containerPath": "/usr/bin/docker",

"sourceVolume": "docker_daemon",

"readOnly": null

},

{

"containerPath": "/var/run/docker.sock",

"sourceVolume": "docker_socket",

"readOnly": null

}

Task Definitions

],

}

],

"volumes": [

{

"host": {

"sourcePath": "/var/run/docker.sock"

},

"name": "docker_socket"

},

{

"host": {

"sourcePath": "/usr/bin/docker"

},

"name": "docker_daemon"

}

],

"family": "base-container-box-task”

Clean Testing Environments

• Docker images

• Nearly instant machine refresh

• Easy for users to create and upload images that have

been tested to work locally

• Efficient Machine use

• Amazon EC2 Container Service with EC2 Container

Repository and private repository backend

Docker Start Up

Docker Start Up

Dynamic Worker Scaling

Simple

Queue

Service

LambdaSimple

Notification

Service

Lambda

Scaling

Bin Packing

EC2 Container Service

Dynamic Worker Scaling

Lambda allocates jobs using bin packing

This is one of the changes we had to make in order to use

EC2 Container Service for long running tasks, rather than

services spread across many stateless instances

Disconnects unneeded nodes from cluster allowing

themselves to self terminate when they are idle

VS

Dynamic Worker Scaling

Lambda allocates jobs using bin packing

This is one of the changes we had to make in order to use

EC2 Container Service for long running tasks, rather than

services spread across many stateless instances

Disconnects unneeded nodes from cluster allowing

themselves to self terminate when they are idle

VS

Dynamic Worker Scaling

Lambda allocates jobs using bin packing

This is one of the changes we had to make in order to use

EC2 Container Service for long running tasks, rather than

services spread across many stateless instances

Disconnects unneeded nodes from cluster allowing

themselves to self terminate when they are idle

VS

Dynamic Worker Scaling

Lambda allocates jobs using bin packing

This is one of the changes we had to make in order to use

EC2 Container Service for long running tasks, rather than

services spread across many stateless instances

Disconnects unneeded nodes from cluster allowing

themselves to self terminate when they are idle

VS

Dynamic Worker Scaling`

Lambda allocates jobs using bin packing

This is one of the changes we had to make in order to use

EC2 Container Service for long running tasks, rather than

services spread across many stateless instances

Disconnects unneeded nodes from cluster allowing

themselves to self terminate when they are idle

VS

Dynamic Worker Scaling

Spot Instances

Spot Instances

Spot Instances

Versioned Jobs

Scripts checked into repositories Makes a transition to Docker jobs

easy

Versioned Jobs With EC2 Container Service

• Versioned build and test scripts can now be run in

versioned docker containers, using versioned task

definitions

• Creates extreme flexibility

• Cloud formation allows us to stand up whole new

clusters with all different versions in a matter of minutes

for long term testing

EC2 Container Service + Docker Problems

• Docker containers not launching

• EC2 Container Service agent failing

• Docker containers stopping

• Incompatibility with certain services

• Docker OS availability

• Cleanup

• Image size

© Okta and/or its affiliates. All rights reserved.

• Elastic Load Balancer

• Dynamic port mapping to containers

• Fail health based on HTTP return code

• Different health endpoint for adding vs removing

• Bin packing scheduler

• Could provide better cost management reporting and tools

• Ability to mark container instances as un-schedulable

• Remove sharp edges around the stopped state

• Give Auto Scaling Groups ability to set Elastic Compute Cloud instance

”shutdown behavior”

• Periodic cleanup process in Elastic Container Service to deregister stopped

instances

EC2 Container Service Feature Requests

© Okta and/or its affiliates. All rights reserved.

• /etc/ecs/ecs.config

• ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION for forensics (default 1hr)

• ECS_LOGLEVEL=debug

• Beware of running services in same cluster that use the same ports

• Tune Elastic Load Balancer health check

• Docker 1.10 for security enhancements

• Canary & Blue/Green separate service attached to same Elastic Load Balancer

• Rollback is trivial

• Elastic Container Service is incredibly easy to get up and running

• The ecosystem is changing quickly, we are moving cautiously

• Holding off on stateful services in Docker

EC2 Container Service Takeaways

Amazon Web Services

Elastic Compute Cloud

Simple Queue Service

LambdaEC2 Container Service Simple Storage Service

Relational Database Service

Kinesis

EC2 Spot Instances

EC2 Container Registry

CloudFormation

Simple Notification Service

CloudWatch

CloudTrail

Building CI with Amazon Web Services

Future

Expand Use

• Use EC2 Container Service for more services

• Allow Developers to control their test suites and Docker

images more directly

• Developer Environments

• Use docker for local long running services

• Use a VM running the same version OS

• Remote updates to keep it in line with CI

• Aim to enable running CI containers right out of the box

Result: Happy Engineering Team

• Developers can write more tests quicker.

• Happy devs, timely build/test status feedback.

• Happy quality team, all tests are run at each commit.

• Happy ops team, release candidate produced quickly.

• Happy management, infra budget is under control.

Thank You

Join us @Okta - www.okta.com/company/careers/

stackshare.io/okta/okta

Remember to complete

your evaluations!