Tokyo OpenStack Summit 2015: Unraveling Docker Security

Download Tokyo OpenStack Summit 2015: Unraveling Docker Security

Post on 15-Apr-2017




0 download

Embed Size (px)


  • Unraveling Docker Security: Lessons From a Production Cloud

    Salman Baset1, Stefan Berger2, Dimitrios Pendarakis3 1Research Staff Member, 2STSM, 3Manager and Research Staff Member IBM Research @salman_baset







    Philip Estes STSM, IBM Cloud @estesp

  • Outline

    What is Docker? Deployment models for Docker Threat model Protection against threats Docker registry and engine configuration Possible attacks Putting it all together

    Acknowledgements: IBM Containers on Bluemix & Docker, OpenStack, and Linux community

  • engine

    What is Docker?

    This talk will focus on Docker

    container security


    Shared Linux kernel

    Client/end user


    Isolation relies on core Linux kernel technologies: cgroups, namespaces, capabilities, LSM restrictions, etc.

    Build, ship and run distributed applications via a common toolbox...

    Docker is now a fast-growing ecosystem of related projects: Compose Swarm Machine Advanced networking Registry (DTR) Kubernetes/Mesos ..among many others

    $ docker run redis $ docker run nginx $ docker run ..

  • Deployment Model

    Host Host

    Single tenant, known code Containers run inside a machine (VM or baremetal)

    A model like VM-based multi-tenant clouds

    Security challenge

    Focus of this talk

    Host Host

    Multi-tenant, unknown code Containers of different tenants run on same machine, virtual nets Expose Docker API to tenants

    tenant 1

    tenant 2

  • Threat Model Containers Attacks on Other Containers Running on Same Machine

    Physical or virtual machine

    ls /root myfile

    PID TTY TIME CMD 1 pts/0 00:00:00 bash

    1. Which other containers are running and which processes others containers are running?

    2. Which files are used by other containers?

    ifconfig, route, iptables, netstat 3. Which network stack is used by other containers?

    sethostname(), gethostname() 4. What is the hostname of other containers?

    Containers overview:

    pipe, semaphore, shared memory, memory-mapped file 5. Are processes of other containers doing any IPC?


  • Threat Model Containers Attacks on Host Machine

    Misconfigured container Malicious container

    Physical or virtual machine

    1. Is root inside a container also root inside host?

    2. Are CPU, memory, disk, and network limits obeyed?

    3. Can a container gain privileged capabilities?

    4. Are other limits obeyed, e.g., fork(), file descriptors?

    5. Can a container mount or DOS host file systems?


  • Threat Model Attacks Launched from Public Internet

    Threat model similar to a VM cloud Not covered in this talk

    Docker cloud

    1. Scan open ports

    2. Guess passwords of common services (e.g., ssh)

    3. (D)DOS


  • Isolating from Other Containers Kernel namespaces for limited system view

    PID space: Process IDs Mount space: Mount points Network space: network interfaces/devices, stacks, ports, etc. UTS space: sethostname(), gethostname() IPC space: System V IPC, POSIX message queues

    In unprivileged containers, devices must be explicitly passed inside container using --device option

    Necessary but not sufficient A container started with privileged capabilities can sneak into other containers and load modules

    Useful links:

  • Isolating from Host User namespaces


    Linux capabilities

    Linux security modules AppArmor/SELINUX


    Docker API

    Docker engine and storage configuration

    Physical or virtual machine

  • Isolating from Host User namespaces Key benefit of user namespaces: deprivileged root user


    $ docker run name cntr -v /bin:/host/bin -ti busybox

    / # id uid=0(root) gid=0(root) groups=10(wheel) / # cd /host/bin /host/bin # mv sh old mv: can't rename 'sh': Permission denied /host/bin # cp /bin/busybox ./sh cp: can't create './sh': File exists

    Host root Container root

    $ docker inspect -f {{ .State.Pid }} cntr 8851 $ ps -u 200000 PID TTY TIME CMD 8851 pts/7 00:00:00 sh

    Will be available in Docker 1.9

  • Resource control

    - CPU - Memory - Swap - Blkio - Network

    Physical or virtual machine


    Isolating from Host (and other containers) control groups

    Useful links


    docker run --cpuset-cpus=0,1 --cpu-shares=512 -m 2G --memory-swap 2G --blkio-weight 500

  • Dockers cgroup support is a work in progress New command line options being added Network cgroup: currently not implemented Linux kernel. cgroups for PID coming in 4.3

    cgroup current limitations Blkio: Bps enforcement seems difficult Memory: needs configuration tweaking to ensure swap limits No accounting for size of PID space

    cgroup v2 added to Linux now Redesigned and improved interface New hierarchical organization

    Isolating from Host (and other containers) cgroups

    Useful links:

  • Isolating from Host (and other containers) Linux Capabilities


    Linux capabilities: fine-grained access control mechanism besides root/non-root Restrict the capabilities available for a process (or a thread)

    e.g., load kernel modules, mount, network admin operations, set time Docker by default drops majority (24 out of 37) Capabilities can be added to a Docker container

    e.g., docker run cap-add=mount

    Physical or virtual machine

    System Call


    open() mount()

    Useful link:

    cat /proc/self/status | grep Cap CapInh: 00000000a80425fb CapPrm: 00000000a80425fb CapEff: 00000000a80425fb CapBnd: 00000000a80425fb

    Default Docker capabilities chown, dac_override, fsetid, fowner, mknod, net_raw, setgid, setuid, setfcap, setpcap, net_bind_service, sys_chroot, kill, audit_write

  • Isolating from Host (and other containers) LSM


    Physical or virtual machine

    Linux security modules for Mandatory access control AppArmor defines restrictions on

    file access, capability, network, mount

    AppArmor Policy

    open(/etc/hosts,) open(/dev/kmem,)

    Default Docker AppArmor Profile for Containers Denies to sensitive data, e.g., LSM

    path on host, kernel memory Denies unmount One single profile for all containers Can define custom profile per container

    Useful links:

  • Isolating from Host Seccomp


    Strict the system calls that the calling thread is permitted to execute Example: CAP_SETUID capability is implemented using four system calls

    setuid(), setreuid(), setresuid(), setfsuid() Can restrict which calls within CAP_SETUID capability are called

    Physical or virtual machine

    System Call


    setuid() setreuid()

    Useful link:

  • Isolating from Host Restrict Docker API Docker engine exposes an API API is powerful and can perform admin operations, e.g., create privileged

    containers In near future, each API call will have authentication and authorization

    Until then, Restrict the APIs available to an end user, e.g.,

    Prevent privileged container creation Prevent addition of capabilities Ensure appropriate AppArmor profile is

    used Container cloud docker run --cap-add

    docker run security-opt=apparmor:profile docker run --privileged

  • Isolating from Host Docker Engine and Storage Configuration Docker Engine Configure TLS for Docker Engine Set appropriate limits, e.g., nproc, file descriptors Docker Security Checklist and Docker Bench

    Docker Storage Consider using devicemapper as storage Consider setting the default filesystem of containers as read only Bind mounted files in Docker have no quota. Consider making them read only.

  • Docker Registry Security Python-based Docker registry V1 weaknesses:

    Image IDs are secrets (effectively) No content verification; audit/validation difficult Layer IDs randomly assigned, linked via parent entries (poor performance)

    Docker Registry V2 API and implementation in Docker 1.6 All content is addressable via strong cryptographic hash Content and naming separated Safe distribution over untrusted channels, data is verifiable Signing and verification now enabled via Docker Content Trust Digests and manifests together uniquely define content+relationships

  • Forkbomb. DOS on host. Host unusable within seconds

    Multiple solutions, e.g., limit number of processes in each container using nproc (handled per Linux user) cgroup PID space coming in Linux kernel 4.3 watc