monitoring docker containers - docker nyc feb 2015

39
Monitoring and Running Docker Containers at Scale Docker NYC Meetup February 25th, 2015

Upload: datadogslides

Post on 16-Jul-2015

1.630 views

Category:

Technology


6 download

TRANSCRIPT

Monitoring and Running

Docker Containers at Scale

Docker NYC Meetup

February 25th, 2015

@alq — CTO at Datadog

Datadog• Monitoring service

• Made for the cloud

• Aggregates everything

• Support for Docker (since 1.0)

Goal of this talkRethink the monitoring of Docker containers

Agenda1.A (very) brief history of containers

2.Operational complexity

3.Monitoring Docker effectively

4.Demo

A brief history of

containers

Containers in a nutshell• Been around for a long time

– jails, zones, cgroups

• No full-virtualization overhead

• Used for runtime isolation (e.g. jails)

• Docker is an Escape from Dependency Hell

Escape from dependency hell

a.out

shared libs

packages

omnibus

Docker ==?

Mini-host or über-process?

Process Container Host

Spec Source Dockerfile Kickstart

On disk .TEXT /var/lib/docker /

In memory PID Container ID Hostname

In the network Socket veth* eth*

Runtime

context

server core host data center

Mini-host or über-process?

Operational

complexity

Combinatorial multiplication

Hardware

OS

Off-the-shelf

Your Application

Hardware

Hypervisor

Off-the-

shelf

App

OS OS

Off-the-

shelf

App

Hardware

Hypervisor

OS OS

A A A A

Containers

O O O O

Operational complexity• Average containers per host: N (N=5, 10/2014)

• N-times as many “hosts” to manage

• Affects

– provisioning: prep’ing & building containers

– configuration: passing config to containers

– orchestration: deciding where/when containers

run

– monitoring: making sure containers run

properly

Complexity increases with...

1. Number of things to measure

2. Velocity of change

Number of things to measure• 1 Amazon EC2 instance

– 10 CloudWatch metrics

• 1 operating system (e.g. linux)

– 100 metrics

•N containers

– 100*N metrics

•110 + 100*N metrics per instance

Combinatorial multiplication

100 500instances containers

Assuming only 5 containers per instance

Combinatorial multiplication

160 610metrics

per hostmetrics

per host

Assuming only 5 containers per

instance

Combinatorial multiplication

100 61,000instances metrics

Assuming only 5 containers per instance

Velocity

hours,

days,

months

minutes,

hours,

days

Host half-life Container half-life

Aggravating factors• Registry-based provisioning

– new images as fast as you can git commit

• Autonomic orchestration

– from imperative to declarative

– automated

– individual containers don’t matter

– e.g. kubernetes, mesos

A lot more,

A lot faster.

If your monitoring is still centered on individual hosts or

instances…

Host-centric monitoring

Monitor

Monitor

GA

P

Hypervisor

OS OS

A A A A

Containers

O O O O

A lot more pain,

A lot faster.

Monitoring containers

effectively

A new approach to container monitoring

Layers +

Tags

Layers of monitoring

Monitor

Hypervisor

OS OS

A A A A

Containers

O O O O

Layers of monitoring

CloudWatch

Infrastructure

Monitoring

APM

Hypervisor

OS OS

A A A A

Containers

O O O O

Layers of monitoring

cpu/net/io

filesystem

docker mem

docker cpu

db queries

web requests

app throughput

CloudWatch

Infrastructure

Monitoring

APM

e.g

.

Hypervisor

OS OS

A A A A

Containers

O O O O

Layers of monitoring• Access to metrics from all the layers

• Amazon CloudWatch, OS metrics, Docker metrics,

app metrics in 1 place

• Shared timeline

If monitoring

does not cover all

layers,

pain.

Tags (a.k.a. labels)

You (probably) already use them

Tags• Monitoring is like Auto-Scaling Groups

• Monitoring is like Docker orchestration

• From imperative to declarative

• Query-based

• Queries operate on tags

Monitoring with tags and queries

“Monitor all Docker containers running image web”

“… in region us-west-2 across all availability zones”

“… and make sure resident set size < 1GB on c3.xl”

Monitoring with tags and queries

“Monitor all Docker containers running image web”

“… in region us-west-2 across all availability zones”

“… and make sure resident set size < 1GB on c3.xl”

Monitoring with tags and queries

“Monitor all Docker containers running image web”

“… in region us-west-2 across all availability zones”

“… that use more than 1.5x the average on c3.xl”

Demo: layers & tags

Take-aways1. Docker increases operational complexity by an

order of magnitude unless…

2. You have layered monitoring, from the instance to

the container and to the application, and…

3. You monitor using tags and queries