dockercon eu 2015: persistent, stateful services with docker cluster, namespaces and docker volume...

66
Persistent, stateful services with docker clusters, namespaces and docker volume magic Michael Neale Co-founder, CloudBees (that Jenkins company)

Upload: docker-inc

Post on 12-Jan-2017

4.851 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Persistent, stateful services with docker clusters, namespaces and docker volume magicMichael NealeCo-founder, CloudBees (that Jenkins company)

Page 2: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Agenda

Supercontainers and storagePrivilegesIt’s all files (part 2)Controlling the host and peer containersStorage engines

Stateful docker clusters“off the shelf” cluster schedulingThe solution chosenOther tools out thereCredits…

BackgroundUse-case for stateful servicesDocker volumesQuick namespaces revisionnsenter

Mounts and VolumesIt’s all files (part 1)the mount namespacecreating bind mountsdocker volume api (use it!)

Page 3: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

BackgroundThe Need for Stateful Services

Page 4: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Basis of this presentation:

.. was learned while building an elastic and scalable Jenkins based product for multiple cloud

environments, on docker

Page 5: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

—Michael Neale

“No containers were hurt as part of this

production.”

Page 6: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

My history with docker

Ex Red Hat where I heard about “control groups”Starting CloudBees, looking at ways to fairly multi tenantLater would discover (and with much help) use LXCSaw a video of Solomon demoing docker and didn’t believe itStill didn’t believe itFor the longest time didn’t believe it

Page 7: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

CloudBees & Docker

Actually spoke about this at DockerCon 2014 (the first one!)cgroups -> LXC -> LXC + ZFS copy-on-writeLike dotCloud - ran a PaaS (as well as CI/CD toolchain)In 2014 moved to focus on CI/CD (dotCloud focussed on docker)In 2014 moved to adopt docker over LXC (and ZFS)Using: Docker Hub (private repos), Private RegistryMany of our customers are commercial users of dockerDocker Jenkins plugins: docker hub, build and publish and many more

Page 8: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Put all the things (OSS and commercial) on docker hub

Page 9: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

I started the “official” jenkins image early onupdated now ~weekly (with LTS images also)

Page 10: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

one MEEELION ??

Page 11: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

A stateless cluster of apps is the dream

But the reality is, many apps still need state, a diskDatabases for exampleHands up who would run Oracle on NFS?

Page 12: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Reality: local diskNetwork filesystems are great*But sometimes you need the data close to the processingEBS, HDFS, GCP, OpenStack block storage… BUT: how to balance this need for local state with “ephemeral” serversServers come and go, need to restore the data (fast)Need to backup the data (delta/snapshots - fast)Alternatives: SANs (reattach volumes to replacement nodes, some clouds also support this)Reason for backups: resilience. Volumes can disappear too.

Page 13: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Current product

Years of experience with containersEC2, ZFS, EBS, LXClearn from it to build something new and “turn key” installable, powered by dockerI accidentally created a cluster scheduler (it happens.. please don’t)An evolved “pre-docker” system

Page 14: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Aim: a new product

A distributed Jenkins cluster10000s of “masters”, 100000s of elastic build workersUtilise “Off The Shelf” expertise based around docker: Mesos, Docker Swarm, KubernetesWork within existing constraints of a lively and evolving open source project(this means accepting local disk state… for now)

Page 15: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Additional ConstraintsOnly want to depend on docker being present on “worker nodes”Off the shelf cluster schedulerUse local disk*Multiple target clouds to be supportedMultiple storage “engines” to be supported

* Would love to refactor to DB backed

Page 16: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

“Storage engines?”“The thing that backs up and restores local disks”

eg: EBS (snapshots), rsync, NFS, ZFS send …

Same cluster management, same api, different storage tech for different clouds/needs.

Ensures volumes are backed up in a consistent state (using LVM snapshot, xfs_freeze, as needed)

Page 17: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Docker volumes

Docker helpfully lets you bind mount to hostGiving you a choice of ways to get data to the hostContainers can remain ephemeralHowever, you need to manage those underlying volumes

Note: you shouldn’t need to do what I did. Use something off the shelf if you can. If you must, there is an excellent docker plugin api and volume plugin api.

Page 18: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Solving local disk with docker

client cluster sched. docker host storage

runn

request appfind free slot

ask for dataprovide data

Container fully running with data

Page 19: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Using “trickery”

client cluster sched. docker host storage

runn

request appfind free slot

request data

provide data, bind mount

container starts, asks for dynamic

bind mount, waits

Page 20: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

With docker volume plugin api

client cluster sched. docker host storage

runn

request appfind free slot

jsonprovide datadocker calls

volume plugin BEFORE

container starts,

launches with bind mount

Page 21: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

However: Docker plugin api did not exist yet!

I had to make do with “trickery”Other choices like powerstrip existed, but wanted “standard” dockerAnd you are here for namespace trickerySo lets learn from it…

Page 22: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

—Unknown

“Hard work pays off eventually, but laziness

pays off right now.”

Page 23: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Namespaces - really quick…Along with cgroups are “foundational tech” for containers6 types: Mount, UTS, IPC, PID, Network and UserMy favourites: Mount: filesystem stuff (that I used)PID, Network and the exciting User namespaces!

https://lwn.net/Articles/531114/

Page 24: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

How do we access these namespaces?

nsenter - command line toolnsenter allows you to “enter” a namespace and do something in the context of itAvailable out of the box in many linux distros now

https://github.com/karelzak/util-linux/blob/master/sys-utils/nsenter.c

https://blog.docker.com/tag/nsenter/ 

Page 25: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Mounts and VolumesIt’s all files in Linux - part 1

Page 26: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Mount namespace

Containers don’t see all mount points, all devices, just their ownAllows dockers “bind mount” to workA “bind mount” in linux is really an “alternative view of an existing directory tree”A docker bind mount takes that “alternative view” and makes it visible to the container (via its mount name space)Magic? No. Linux.

Page 27: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

It’s all files, part 1

Start any container Access docker host and run this to get the pid of the whole container:

docker inspect --format {{.State.Pid}} <container id>

You can then see the 6 namespaces in /proc/<PID>/ns:

ls /proc/7865/ns/ipc mnt net pid user uts

Page 28: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

/proc virtual filesystem and nsenter/proc is a virtual filesystem (http://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html)

Run a command inside a given containers namespace:

nsenter --mount=/proc/$PID/ns/mnt -- /usr/bin/command param

RUN A COMMAND FROM HOST AS IF YOU ARE IN THAT CONTAINER

Page 29: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

—Spidermans Uncle

“With great nsenter power, comes great

responsibility ”

Page 30: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Creating a bind mount on a running container( -v /var/foo:/var/bar ) High level steps:

Get the underlying device from the host, into the containermount the device in the containerbind mount in the container to the “directory you want”unmount the device in container remove the initial mount

What you are left with: a bind mount to the volume on the host you wanted in the first place, and only that path. Not the whole device/volume on host.

Page 31: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

You don’t need to do all this yourself, ever!

Page 32: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

# Using a device’s numbers we can create the same device in container

# use nsenter to create a device file IN the container (using its $PID): nsenter --mount=/proc/$PID/ns/mnt -- mknod --mode 0600 /dev/sda1 b 8 0

# Now we have the device ALSO in the container!# We can mount it (normal linux)# bind mount to the desired directory (also normal linux)!# all from the host

Page 33: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

I told you not to panic!

Page 34: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Now we have a dynamic bind mount

As if we used -v /var/foo:/var/bar on startupRemember: DON’T DO THIS!Really: you shouldn’t need to do this yourself. Use the docker plugin volume api! (if you must)

Page 35: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Docker plugin API

Out of process JSON based api (but running on same host)plugins are installed by putting a file in a directory, and referred by name (minutes the extension)Well defined JSON protocol

https://docs.docker.com/extend/plugin_api/

Page 36: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Docker volume plugin API

docker run -v volumename:/data --volume-driver=mydriver ..

“volumename” is passed to the registered volume-driver(which is listening on http) volume-driver then prepares the data somewhere on the host, returns where it lives (via json)… docker then bind mounts it in as /dataAll happens BEFORE container startshttps://docs.docker.com/extend/plugins_volume/

Page 37: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Docker volume plugin API

Would not require messing with namespacesStill allow an out of process “volume service” to take care of messy volume detailsHowever - DOES require you to register the plugin with docker on the hostAnd less terrifying fun than nsenter and namespaces

Page 38: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

If you really must

https://github.com/michaelneale/bind-mount-supercontainer

Sample python code that I prototyped this with. Use with care!

Page 39: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Supercontainersand storage enginesLike containers, only more… uh super…

Page 40: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Supercontainers - concept

Term came from Red Hat http://developerblog.redhat.com/2014/11/06/introducing-a-super-privileged-container-concept/You have heard of privileged containers?

docker run --privileged ..

Drops all namespace restrictions“Super privileged containers” add in more access to the underlying host…

Page 41: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

It’s all files (part 2)

Add in the host root filesystem, docker daemon, and all the rest:

docker run -v /var/run/docker.sock:/var/run/docker.sock

—privileged

-v /:/media/host

my-super-container

Brings in docker socket, and root as /media/host/media/host then contains ALL devices, virtual files, /proc etc

Page 42: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

It’s all files (part 2)

Why? We can do everything we did with nsenter before but from WITHIN a “peer container”

Page 43: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic
Page 44: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

It’s all files (part 2)

We can do everything we did with nsenter before but from WITHIN a “peer container”Remember requirements: vanilla docker, only docker installed on hostUse super-container as a “agent” container, do all the automation you could wantNo need for extra bits on the host boxAllows using “off the shelf” cluster scheduling (only docker need be installed)

Page 45: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Controlling the host

Host can be accessed from super-container via nsenter PID of host is 1!

eg, from super-container, get all mounts: nsenter --mount=/media/host/proc/1/ns/mnt -- cat /proc/mounts

Run a command, from container, on the host (stuff after “--")/media/host lets us get to the host. Even devices.

Page 46: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Controlling the host

Host can be accessed from super-container via nsenter Do all the steps as before, but with “nsenter —mount=/media/host/proc/1/ns/mnt” prefixed

Page 47: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Controlling peer containers from supercontainer

Peers are other “ordinary” containers on the same host as the super containerPeers can be accessed from super-container also via nsenter Just like before, we use nsenter, with the peer containers $PIDBut prefix it with the hosts filesystem:

nsenter --mount=/proc/$PID/ns/mnt -- ..

becomes:

nsenter --mount=/media/host/proc/$PID/ns/mnt -- ..

Page 48: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Controlling peer containers

Why?Once again, use he super-container as the controlling agent on a hostLess bits to install on the host

Page 49: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Storage engines

My requirement: multiple implementations for different cloudsDifferent clouds have different storage enginesSuper container great place to host volume serviceDifferent implementations on service depending on what is on offerEBS, NFS, openstack rsync and moreThis “volume service super-container” is responsible for backup/restore

Page 50: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Storage engines - eg an AWS region

zone-1 zone-2

serverA

serverBserver

Aserver

Bvol-1 vol-2

vol-1vol-1 vol-1vol-2snapshots

request backup

Page 51: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Snapshots/backups

Snapshots a cheap and quickZone resilienceVolumes (ie: disks) are not as durable as snapshots/backupsSimilar in other platforms: GCP, OpenStack, Azure. Google compute persistent disks: does allow volumes read-only extra mounts across instances for redundancy of compute nodesIn our case: failing over is “restoring from backup” - always test your backups!

Page 52: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Supercontainers - summary

A useful tool for low level controlNo need to install bits on the hostCan control peers directlyCould be a great place to host a docker volume plugin implementation(not currently recommended in Docker plugin api docs)

Page 53: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Stateful clustersEveryone wants to be stateless…

Page 54: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

What we built…

.. an elastic and scalable Jenkins based product for multiple cloud environments, on docker

Page 55: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Cluster schedulers/managers

Remember: I have build schedulers before, would rather not againDocker Swarm, Mesos/Marathon, Kubernetes etcSome have concepts of volumesAll can schedule “plain” docker containersSuper containers can give you a way to get lower level access

Page 56: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

What we settled on

Super containers to implement volume serviceSupport for multiple storage engines for different cloudsScheduled via mesos+marathonOnly docker (+ mesos in this case) required on the hostsWhy mesos: practical choice for us but not a tight coupling(could mesos be in a super container? probably)Using containers for all the things: elastic search nodes, builds, even haproxyFor us, 5 minute or event based backups/snapshots are fine

Page 57: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Running supercontainers

Eg. marathon: schedule a super container to run on each host Constraint on volume service: one per host, size: number of servers in cluster (3 in this case):

vol service vol servicevol service

master masterelastic search

haproxy

(free)

Page 58: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Working with EBS (an example)

client container volume service EBS api

requests backup

freeze for snapshotinitiate snapshot

unfreeze backup delta,copy to s3

optimisation: use LVM snapshot instead of freeze

Page 59: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Backups, backups

Servers are ephemeralServers come and goDisks are fallible (even if cloud platforms call them “volumes”)Workload moves aroundRestore data when workload is moved to a new locationDelta backups are used to avoid full copies each time

Page 60: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Cluster schedulers/managers

Storage awareness is being built in increasingly (Kubernetes volumes, mesos storage awareness)Ideal world: your cluster manager will do all this for you. If you live in that world: congrats. Make yourself a cocktail:

Page 61: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

My recipe for no-sugar old fashioned:https://gist.github.com/michaelneale/6034145

Page 62: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

“off the shelf” stateful volume tools

Rexray: use volume plugin api for Amazon EBS, Rackspace and moreFlocker from ClusterHQKubernetes volume supportApache “Mysos”: MySQL service backed up to HDFS on mesosTutum from Docker! has support for persistent volumesWatch this space… (changing constantly)

https://docs.clusterhq.com/en/1.4.0/labs/docker-plugin.html

https://github.com/emccode/rexray

Page 63: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Stateful volumes summary

It is possible with dockerAvoid doing it yourself is someone else already hasUsing local filesystem directly does feel a bit like “legacy”But it is a reality for some apps (especially database services)Lovely to port everything to be stateless, database backed, blobstore backed, but it takes timeLean on the capabilities of the underlying platform where you can

Page 64: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Credits

Jérôme Petazzoni (@jpetazzo) - years of inspirational blog posts, hacks on linux/docker/volumes. And great hair. http://jpetazzo.github.io/2015/01/13/docker-mount-dynamic-volumes/ - BTW Jerome - it works for real!Red Hat for Super Container concepts: Daniel Walsh: http://developerblog.redhat.com/2014/11/06/introducing-a-super-privileged-container-concept/Trevor Jay from Red Hat for some final namespace tips https://securityblog.redhat.com/author/tjay/I really just mashed up the above concepts: https://michaelneale.blogspot.com.au/2015/02/mounting-devices-host-from-super.html

Page 65: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

@jpetazzo’s hair - imminent singularity?

2012 2013 2014 20150

45

90

135

180

225

Region 1

Page 66: DockerCon EU 2015: Persistent, stateful services with docker cluster, namespaces and docker volume magic

Thank you!Michael Neale@[email protected]