techtalks: taking docker to production

52
Taking Docker To production JOSA TechTalk by Muayyad Saleh Alsadi http://muayyad-alsadi.github.io/

Upload: muayyad-alsadi

Post on 21-Jan-2017

1.401 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Techtalks: taking docker to production

Taking Docker To productionJOSA TechTalk by Muayyad Saleh Alsadihttp://muayyad-alsadi.github.io/

Page 2: Techtalks: taking docker to production
Page 3: Techtalks: taking docker to production

What is Docker again? (quick review)

Containers

uses linux kernel features like:

● namespaces● cgroups (control

groups)● capabilities.

Platform

Docker is a key component of many PaaS. Docker provide a way to host images, pull them, run them, pause them, snapshot them into new images, view diffs ..etc.

Ecosystem

Like github, Docker Hub provide publicly available community images.

Page 4: Techtalks: taking docker to production

Containers vs. VMs

No kernel in guest OS (shared with host)containers are more secure and isolated than chroot and less isolated than VM

Page 5: Techtalks: taking docker to production

Why DevOps?

Devs

want change

Ops

wants stability (no change)

DevOps

resolve the conflict.

for devs: docker image contains the same os, same libraries, same version, same config, ...etc.

for admins: host is untouched and stable

Blame each other

Fight each other

Page 6: Techtalks: taking docker to production

Devs Heaven (not for production)

docker compose can bring everything up and connect them and link them with a single command. can mount local dir inside the image (so that developer can use his/her favorite IDE). The command is

docker-compose up

it will read “docker-compose.yml” which might look like:mywebapp: image: mywebapp volumes: - .:/code links: - redisredis: image: redis

Page 7: Techtalks: taking docker to production

Operations Heaven

Having a stable host!

CoreOS does not include any package manager. and does not even have python or tools installed. They have a Fedora based docker image called toolbox.

You can mix and match. Some containers runs Java 6 or Java 7. Some uses CentOS 6, others 7, others ubuntu 14.04 others Fedora 22 ..etc. in the same host.

Page 8: Techtalks: taking docker to production

Linking Containers

docker run -d --name r1 redisdocker run -d --name web --link r1:redis myweb

r1 is container nameredis is link aliasit will update /etc/hosts and set ENVs:

● <alias>_NAME = <THIS>/<THAT> # myweb/r1● REDIS_PORT=<tcp|udb>://<IP>:<PORT>● REDIS_PORT_6379_TCP_PROTO=tcp● REDIS_PORT_6379_TCP_PORT=6379● REDIS_PORT_6379_TCP_ADDR=172.17.1.15

Page 9: Techtalks: taking docker to production

Pets vs. Cattle vs.Ants

Pets (virtualization)

The VM has

● lovely distinct names

● emotions● many highly coupled

roles● if down it’s a

catastrophe

Cattle (cloud)

● no names● no emotions● single role● decoupled (loosely

coupled)● load-balanced● if down other VMs take

over.● VM failure is planned and

part of the process

Ants (docker containers)

containers are like cloud vms, no names no emotions, load balanced.

A single host (might be a VM) is highly dense. The host is stable. Large group of containers are designed to fail as part of the process.

Page 10: Techtalks: taking docker to production

What docker is not

● docker is not a hypervisor○ docker is for process containers not system containers○ example of system containers: LXD and OpenVZ

● no systemd/upstart/sysvinit in the container○ docker is for process containers not system containers○ just run apache, nginx, solr, whatever○ TTYs are not needed○ crons are not needed

● Docker is not for multi-tenant

HINT: LXD is stupid way of winning a meaningless benchmark

Page 11: Techtalks: taking docker to production

Docker ecosystem

● CoreOS, Atomic OS, Ubuntu Core● Openshift (redhat PaaS)● CloudFoundary● Mesos / mesosphere (by Twitter and now apache)● Google Kubernetes (scheduler containers to hosts)● Swarm● etcd/Fleet● Drone● Deis, Flynn, Rancher

Page 12: Techtalks: taking docker to production

Docker golden rules

by twitter@gionn:

● only one process per image● no embedded configuration● no sshd, no syslog, no tty● no! you don't touch a running container to adjust things● no! you will not use a community image

Page 13: Techtalks: taking docker to production

Theory vs. Reality

docker imaginary “unicorn” apps

● statically compiled (no dependencies)

● written in golang● container ~ 10MB

on real world

● interpreted application (python, php)

● system dependencies, config files, log files

● multiple processes (nginx, php-fpm)

● container image >500MB

Page 14: Techtalks: taking docker to production

12 Factor - http://12factor.net/

1. One codebase (in git), many deploys2. Explicitly declare and isolate dependencies3. get config from environment or service discovery4. Treat backing services as attached resources (Database, SMTP, S3, ..etc.)5. Strictly separate build and run stages (no minify css/js on run stage)6. Execute the app as one or more stateless processes (data and state are

persisted elsewhere apart from the app, no need for sticky sessions)7. Export a port (an end point to talk to)8. Scale out via the process model9. Disposability: Maximize robustness with fast startup and graceful shutdown

10. Keep development, staging, and production as similar as possible11. Logs: they are flow of events written to stdout that is captured by execution

env.

Page 15: Techtalks: taking docker to production

12 Factor

last factor is administrative processes● Run admin/management tasks as one-off processes

○ in django: manage.py migrate● One-off admin processes should be run in an identical

environment as the regular long-running processes of the app

● shipped from same code (same git repo)

Example of 12 Factor: bedrock - a 12 factor wordpresshttps://roots.io/bedrock/

Page 16: Techtalks: taking docker to production

12 Factor - Factorish

can be found on https://github.com/factorish/factorish

example:https://github.com/factorish/factorish-elk

Page 17: Techtalks: taking docker to production

Config

● confd○ written in go (a statically linked binary)○ input

■ env variables■ service discovery (like etcd and consul)■ redis

○ output ■ golang template with {{something}}

● crudini, jq● http://gliderlabs.com/registrator/latest/user/quickstart/

Page 18: Techtalks: taking docker to production

Config

● container’s entry point (“/start.sh”) calls REST API to add itslef to haproxy or anyother loadbalancer

● container’s entry point uses discovery service client (ex. etcdctl)

● something listen to docker events and send each container ENV and labels to discovery service

Page 19: Techtalks: taking docker to production

Multiple Process

● supervisord● runit● fake systemd

○ see free-ipa docker image○ https://github.com/adelton/docker-freeipa

Page 20: Techtalks: taking docker to production

Logging/Monitoring

● ctop● cadvisor: https://github.com/google/cadvisor● logstash● logspout - https://github.com/gliderlabs/logspout

Page 21: Techtalks: taking docker to production

Logging/Monitoring

nginx logging use “error_log /dev/stderr;” and “access_log /dev/stdout;” with daemon off. for example in supervisord[program:nginx]directory=/var/lib/nginxcommand=/usr/sbin/nginx -g 'daemon off;'user=rootautostart=trueautorestart=trueredirect_stderr=falsestdout_logfile=/dev/stdoutstderr_logfile=/dev/stderrstdout_logfile_maxbytes=0stderr_logfile_maxbytes=0

Page 22: Techtalks: taking docker to production

Logging/Monitoring

Page 23: Techtalks: taking docker to production

Web UI● tumtum● cockpit-project.org● Shipyard● FleetUI● CoreGI● SUSE/Portus

Page 24: Techtalks: taking docker to production

Web UI - cockpit-project

Page 25: Techtalks: taking docker to production

Web UI - shipyard

Page 26: Techtalks: taking docker to production

Web UI - tumtum

Page 27: Techtalks: taking docker to production

Building Docker Images

● Dockerfile and “docker build -t myrepo/myapp .”○ I have a proposal using pivot root inside dockerfile

(docker build will build the build environment then use another fresh small container as target, copy build result and pivot). Docker builder is frozen but details are here

● Dockramp○ https://github.com/jlhawn/dockramp○ external builder written in golang○ uses only docker api (needs new “cp” api)○ can implement my proposal

● Atomic app / Nulecule/ openshift have their ownway● Use Fabric/Ansible to build

Page 28: Techtalks: taking docker to production

Simple Duct tape launching.

Systemd @ magic. ex: have [email protected]# systemctl start container@myweb[Unit]Description=Docker Container for %IAfter=docker.serviceRequires=docker.service[Service]Type=simpleExecStartPre=bash -c “/usr/bin/mkdir /var/lib/docker/vfs/dir/%i || :”ExecStartPre=/usr/bin/docker kill %iExecStartPre=/usr/bin/docker rm %iExecStart=/usr/bin/docker run -i \ --name=”%i” \ --env-file=/etc/sysconfig/container/%i.rc --label-file=/etc/sysconfig/container/%i.labels -v /var/lib/docker/vfs/dir/%i:/data myrepo/%i

Page 29: Techtalks: taking docker to production

Seriously?Docker on production!

“Docker is about running random code downloaded from the Internet and running it as root.”[1][2]

-- a redhat engineer

Source 1, source 2

Page 30: Techtalks: taking docker to production

● host a private docker registry (so you don’t download random code from random people on internet)

● use HTTPS and be your own certificate authority and trust it on your docker hosts

● use registry version 2 and apply ACL on images○ URLs in v2 look /v2/<name>/blobs/<digest>

● use HTTP Basic Auth (apache/nginx) with whatever back-end you like (ex. LDAP or just plain files)

● have a Read-Only user as your “deployer” on servers● have a build server to push images (not developers)

Host your own private registry

Page 31: Techtalks: taking docker to production

“Containers do not contain.”

-- Dan Walsh (Redhat / SELinux)Seriously?

Docker on production!

Page 32: Techtalks: taking docker to production

in may 2015, a catastrophic vulnerability affected kvm/xen almost every datacenter.

Fedora/RHEL/CentOS had been secure because of SELinux/sVirt (since 2009)

AppArmor was a joke that is not funny.

http://www.zdnet.com/article/venom-security-flaw-millions-of-virtual-machines-datacenters/https://fedoraproject.org/wiki/Features/SVirt_Mandatory_Access_Control

Docker and The next Venom?

sVirt do support Docker

What happens in a container stays in the container.

Page 33: Techtalks: taking docker to production

● Drop privileges as quickly as possible● Run your services as non-root whenever possible

○ apache needs root to open port 80, but you are going to proxy the port anyway, so run it as non-root directly

● Treat root within a container as if it is root outside of the container

● do not give CAP_SYS_ADMIN to a container (it’s equivalent to host root)

Recommendations

Page 34: Techtalks: taking docker to production

Setting proper storage backend

● docker info | grep ‘Storage Driver’● possible drivers/backends:

○ aufs: a union filesystem that is so low quality that was never part of official linux kernel○ overlay: a modern union filesystem that was accepted in kernel 4.0 (too young)○ zfs: linux port of the well-established filesystem in solaris. the quality of the port and driver is still

questionable○ btrfs: the most featureful linux filesystem. too early to be on production○ devicemapper (thin provisioning): well-established redhat technology (already in production ex.

LVM)● do not use loopback default config in EL (RHEL/CentOS/Fedora)

○ WARNING: No --storage-opt dm.thinpooldev specified, using loopback; this configuration is strongly discouraged for production use

● in EL edit /etc/sysconfig/docker-storage● http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/● http://www.projectatomic.io/blog/2015/06/notes-on-fedora-centos-and-docker-storage-drivers/● http://www.projectatomic.io/docs/docker-storage-recommendation/

Page 35: Techtalks: taking docker to production

Storage backend (using script)man docker-storage-setupvim /etc/sysconfig/docker-storage-setupdocker-storage-setup

● DEVS=“/dev/sdb /dev/sdc”○ list of unpartitioned devices to be used or added○ if you are adding more, remove old ones○ required if VG is specified and does not exists

● VG=“<my-volume-group>”○ set to empty to use unallocated space in root’s VG

Page 36: Techtalks: taking docker to production

Storage backend (manual)pvcreate /dev/sdcvgcreate direct-lvm /dev/sdclvcreate --wipesignatures y -n data direct-lvm -l 95%VGlvcreate --wipesignatures y -n metadata direct-lvm -l 5%VGdd if=/dev/zero of=/dev/direct-lvm/metadata bs=1M vim /etc/sysconfig/docker-storage # to add next line

DOCKER_STORAGE_OPTIONS = --storage-opt dm.metadatadev=/dev/direct-lvm/metadata --storage-opt dm.datadev=/dev/direct-lvm/data

systemctl restart docker

Page 37: Techtalks: taking docker to production

Docker VolumesNever put data inside the container (logs, database files, ..etc.). Data should go to mounted volumes.

You can mount folders or files. You can mount RW or RO.

You can have a busybox container with volumes and mount all volumes of that container in another container.

# docker run -d --volumes-from my_vols --name db1 training/postgres

Page 38: Techtalks: taking docker to production

Everything is a child processes of a single daemon. Seriously!

Seriously?Docker on production!

Page 39: Techtalks: taking docker to production

Docker process model is flawedDocker daemon launches containers as attached child processes. if the daemon dies all of them will collapse in a fatal catastrophe. Moreover, docker daemon has so many moving parts. For example fetching images is done inside the daemon.Bad network while fetching an image or having an evil image might collapse all containers.https://github.com/docker/docker/issues/15328

An evil client, an evil request, an evil image, an evil contain, or an evil “inspect” template might cause docker daemon to go crazy and risk all containers.

Page 40: Techtalks: taking docker to production

Docker process model is flawedCoreOS introduced more sane process model in rkt (Rocket) an alternative docker-like containers run time. RedHat contributes to both docker and rocket as both has high potential. Rkt is just a container runtime where you can run containers as non-root and without being a child to anything (ex. rely on systemd/D-Bus). Rocket is not a platform (no layers, no image registry service, ..etc.)

https://github.com/coreos/rkt/

Docker might evolve to fix this, dockerlite is a shell script uses LXC and BTRFS

https://github.com/docker/dockerlite

For now just design your cluster to fail and use anti-affinity

Page 41: Techtalks: taking docker to production

Networking.

Linux Bridges, IPTables NATing, Export ports using a young proxy written in golang. Seriously!

Seriously?Docker on production!

Page 42: Techtalks: taking docker to production

Docker Networking nowDocker uses Linux bridges which only connect within same host.Containers on host A can’t talk to container on host B! And uses NAT to talk to outside world# iptables -t nat -A POSTROUTING -s 172.17.0.0/16 -j MASQUERADE

Exported ports in docker are done via a docker proxy process (written in go). check “netstat -tulnp”

Deprecated geard used to connect multiple hosts using NAT and configured each container to talk to localhost for anything (ex. talk to localhost MySQL and NAT will take it to MySQL container on another host):

# iptables -t nat -A PREROUTING -d ${local_ip}/32 -p tcp -m tcp --dport ${local_port} -j DNAT --to-destination ${remote_ip}:${remote_port}# iptables -t nat -A OUTPUT -d ${local_ip}/32 -p tcp -m tcp --dport ${local_port} -j DNAT --to-destination ${remote_ip}:${remote_port}# iptables -t nat -A POSTROUTING -o eth0 -j SNAT --to-source ${container_ip}

Page 43: Techtalks: taking docker to production

Docker Networking nowA Similar approach is manually hard-code and divide docker bridges on each host 172.16.X.y and where X is the host and y is the container and use NAT to deliver packets (or 172.X.y.y depending on number hosts and number of containers on each host).

http://blog.sequenceiq.com/blog/2014/08/12/docker-networking/

given a remote host with IP 192.168.40.12 and its docker0 bridge with 172.17.52.0/24, and given a host with docker0 on 172.17.51.0/24 in the later host type

route add -net 172.17.52.0 netmask 255.255.255.0 gw 192.168.40.12iptables -t nat -F POSTROUTING # or pass "--iptables=false" to docker daemoniptables -t nat -A POSTROUTING -s 172.17.51.0/24 ! -d 172.17.0.0/16 -j MASQUERADE

Page 44: Techtalks: taking docker to production

Docker Networking Alternatives● OpenVSwitch (well-established production technology)● Flannel (young project from CoreOS written in golang)● Weave (https://github.com/weaveworks/weave)● Calico (https://github.com/projectcalico/calico)

Page 45: Techtalks: taking docker to production

Docker Networking AlternativesOpenVSwitch:Just like a physical, this virtual switch connects different hosts.

One setup would be connecting each container to OVS without bridge. “docker run --net=none” then use ovs-docker script

The other setup just replace docker0 bridge with one that is connected to OVS. (no change need to be done to each container)

Page 46: Techtalks: taking docker to production

Docker Networking Alternatives# ovs_vsctl add-br sw0

or /etc/sysconfig/network-scripts/ifcfg-sw0then

# ip link add veth_s type veth peer veth_c# brctl addif docker0 veth_c # ovs_vsctl add-port sw0 veth_s

see /etc/sysconfig/network-scripts/ifup-ovs

http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob_plain;f=rhel/README.RHEL;hb=HEAD

Page 47: Techtalks: taking docker to production

Networking the futurein the feature libnetwork will allow docker to use SDN plugins.Docker acquired SocketPlane to implement this.

https://github.com/docker/libnetworkhttps://github.com/docker/libnetwork/blob/master/ROADMAP.md

Page 48: Techtalks: taking docker to production

Introducing Docker Glue● docker-glue - modular pluggable daemon that can run handlers and scripts● docker-balancer - a standalone daemon that just updates haproxy (a special case of glue)

https://github.com/muayyad-alsadi/docker-glue

autoconfigure haproxy to pass traffic to your containers

uses docker labels “-l” to specify http host or url prefix

# docker run -d --name wp1 -l glue_http_80_host='wp1.example.com' mywordpress/wordpress # docker run -d --name wp2 -l glue_http_80_host='wp2.example.com' mywordpress/wordpress # docker run -d --name panel -l glue_http_80_host=example.com -l glue_http_80_prefix=dashboard/ myrepo/control-panel

Page 49: Techtalks: taking docker to production

Introducing Docker Gluerun any thing based on docker events (test.ini)

[handler]class=DockerGlue.handlers.exec.ScriptHandlerevents=allenabled=1triggers-none=0

[params]script=test-handler.shdemo-option=some value

# it will runtest-handler.sh /path/to/test.ini <EVENT> <CONTAINER_ID>

Page 50: Techtalks: taking docker to production

Introducing Docker Glue#! /bin/bash

cd `dirname $0`

function error() { echo "$@" exit -1}

[ $# -ne 3 ] && error "Usage `basename $0` config.ini status container_id"ini="$1"status="$2"container_id="$3"ini_demo_option=$( crudini --inplace --get $ini params demo-option 2>/dev/null || : )echo "`date +%F` container_id=[$container_id] status=[$status] ini_demo_option=[$ini_demo_option]" >> /tmp/docker-glue-test.log

Page 51: Techtalks: taking docker to production

Resources

● http://opensource.com/business/14/7/docker-security-selinux

● http://opensource.com/business/14/9/security-for-docker

● http://www.projectatomic.io/blog/2014/09/yet-another-reason-containers-don-t-contain-kernel-keyrings/

● http://developerblog.redhat.com/2014/11/03/are-docker-containers-really-secure-opensource-com/

● https://www.youtube.com/watch?v=0u9LqGVK-aI● https://github.com/muayyad-alsadi/docker-glue● http://blog.sequenceiq.com/blog/2014/08/12/docker-

networking/● https://docs.docker.com/userguide/dockervolumes/● https://docs.docker.com/userguide/dockerlinks/● https://docs.docker.com/articles/networking/● https://github.

com/openvswitch/ovs/blob/master/INSTALL.Docker.md

● http://radar.oreilly.com/2015/10/swarm-v-fleet-v-kubernetes-v-mesos.html

Page 52: Techtalks: taking docker to production

Q & A