there is no container - ori pekelman

71
#ContainerDayFR There is no container

Upload: paris-container-day

Post on 24-Jan-2018

231 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: There is no container - Ori Pekelman

#ContainerDayFR

There is no container

Page 2: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Ori PekelmanGeekPush at Platform.sh

I am @OriPekeman everywhere (github/twitter/LinkedIn)

Co-Founder & VP of Marketing for Platform.sh, an innovative

second generation PaaS.

My role usually spans beyond the technological aspects to the

business strategy, process design and product marketing.

There is no container

2

Page 3: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

We are in Paris Containers Day, so I could rightly imagine most people around have an understanding of the underpinnings of “containers”. But let’s have a show of hands to see how much time we are going to spend on which slide.

There is no container

3

Group A

I don’t know much about containers. It sounds interesting. I came here to learn.

Page 4: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

We are in Paris Containers Day, so I could rightly imagine most people around have an understanding of the underpinnings of “containers”. But let’s have a show of hands to see how much time we are going to spend on which slide.

There is no container

4

Group B

I use Docker. In production. It works and I never had to care about how it is implemented.

Page 5: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

We are in Paris Containers Day, so I could rightly imagine most people around have an understanding of the underpinnings of “containers”. But let’s have a show of hands to see how much time we are going to spend on which slide.

There is no container

5

Group C

I implement my own container stuff. I have Kernel-Fu. I know how this stuff is built.

Page 6: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

1. This is meant as an entry-level talk, I will still discuss some nuts and bolts.. so when I am unclear. Interrupt me. I don’t mind.

2. I am rusty. They make me do marketing these days. So when I am wrong. Interrupt me. I don’t mind.

3. Even more so as we have the incredible honor of having people like Jessie Frazelle with us, people that participated in building many of the nuts and some of the bolts.

So, please, Jessie and you other experts, forgive the depths of my ignorance and any and all lies and errors I am about to spout.

There is no container

6

Page 7: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

What do containers solve? Why do we need containers?

There is no container

7

Containers allow us to package complex software in a reusable format that is easy to deploy, making automation easier.

Sometimes they make updating software easier (with stateless systems… just build a new one, kill the old).

They have lower overhead in terms of memory usage than VMs, so they are less expensive.. and we can have more of them.

They allow us to reason about the systems we run at lesser granularity. AKA abstraction. In greek Atom means - that which cannot be divided. The container is our Atom.

Page 8: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

There is no container

8

Page 9: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

There is no container

9

Page 10: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

There is no container

10

Page 11: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

There is no container

11

Page 12: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

There is no container

12

Page 13: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

The canonical image of the container is something like

There is no container

13

Page 14: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

An orderly world where we put software in opaque boxes

There is no container

14

Page 15: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

The boxes have a common, simple interface, that is not influenced by their content

There is no container

15

Page 16: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

From the outside we don’t care what is inside. There are no dependencies on the exterior world.

There is no container

16

Page 17: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

That is our intuitive abstraction popularized by Docker™

There is no container

17

Page 18: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

We can move containers. Install them. Run them. Without ever knowing what was inside.

There is no container

18

$ docker pull complex_piece_of_software:latest

$ docker run complex_piece_of_software:latest

Page 19: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

The “Nuts and bolts” truth of the matter is probably inverse. The container does not create opacity from the outside in.

There is no container

19

Page 20: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

But from the inside out.

There is no container

20

Page 21: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

From the system’s point of view

There is no container

21

Page 22: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

This is the reality

There is no container

22

Page 23: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

From the outside, the kernel, UID 0, they see all. For them, there is no container.

There is no container

23

Page 24: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

There is no container

It is from the“containerized” process point of view that the world changes. Becomes smaller.

24

Page 25: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

When we create a container what happens is that using a bunch of different Kernel features and modules (cgroups, namespaces, seccomp...) we:

There is no container

25

Page 26: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Limit the visibility on the outside world (namespaces)

There is no container

26

Page 27: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Limit the availability of resources from the outside world (cgroups)

There is no container

27

Page 28: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Sometimes outright lie about the world (namespaces)

There is no container

28

Page 29: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

And we limit the capabilities of the process in what it can invoke as functionalities from the Kernel (seccomp .. and more…)

There is no container

29

Page 30: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

There is no container

30

Page 31: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

There is an operating system. In our case Linux. It abstracts away the hardware.

No software on a normal computer runs “outside” of the operating system. Yup. Even assembly / machine code. You can’t access the processor, memory or hardware without going through it. What you run on Linux are ELF binaries. Nothing else.

Your program interacts with its operating system through System Calls, it cas ask for memory, access to stuff (like the network or the disk), it can ask the operating system to run some other processes. A bunch of fun stuff.

So.. let’s create a container.

There is no container

31

Page 32: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

So.. let’s create a container.

There is no container

32

Interactions with the OS pass through system calls.. but sometimes it gets fancy and proposes higher-level constructs to make it easy (like a pseudo-file-system). Most often we will use libraries and full-blown integrated apps to take care of talking to the OS. More on that later.

In Linux processes are organized in a tree. Each process has an ID, and a parent; Everything starts with 0 which is the scheduler and 1, which is init. Everything else is going to get invoked from those and down.

In linux we have three different calls to start a process exec() which we don’t really care about here. fork() which copies the current process with a new PID and clone() that copies all or some of the current process and runs the new process as a child.

Page 33: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

So, how do we make the world seem smaller to a process?

When creating our process we can pass a couple of parameters to clone() that will tell our operating system how it is going to live.

A bunch of these parameters (or flags) are called CLONE_NEW[...SOMETHING….] Some of these parameters, not all, can be modified later-on using the unshare() system call.

So.. let’s create a container.

There is no container

33

Page 34: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

So.. let’s create a container.

There is no container

34

For example the parameter CLONE_NEWUTS tells the operating system that:

1. Our newly created process can call sethostname() and that doing so, instead of changing the hostname for the whole OS, it is going to keep a record, just for that Namespace of the Host Name.

2. So when, later the process calls gethostname() it will return whatever was put through this namespace’s sethostname().

So unlike all of its cousins and parents this process thinks the name of the machine it is running on is different.

We tricked it! (remember the part about lying?)

Page 35: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Setting up namespaces

There is no container

35

So.. we create a new process, and we attach a namespace to it, either at its creation with the flags we pass to clone(), later using the unshare() system call, that can change some of the namespaced resources or using the setns() system call that would set a namespace for an existing process.

Page 36: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

So.. let’s create a container.

There is no container

36

Having a different machine name per process is cool.

But not that useful right? That is not a container.

What else can we isolate?

Page 37: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Isolating the file-system

There is no container

37

As far as containers are concerned the most important thing is the file-system. This is done through CLONE_NEWNS.

1. First we create the new mount namespace2. We can than unmount the stuff from the parent namespace and mount the

various things we need to mount in our target dir (we want to get to a usable root file system).

3. Run `pivot_root $TARGETDIR` and voilà!

We can have different mounts and isolate parts of the file-system! As a side note, doing stuff like mounting, requires “capabilities” in this case CAP_SYS_ADMIN. More often than not these are going to have been dropped. So this is not always trivial.

Page 38: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

So.. let’s create a container.

There is no container

38

We can decide what mounts are going to be shared from the “host”. We can totally decide that /var/lib is going to be common. Nothing disallows this.

We can use some crazy layered file system (like AUFS or OverlayFS) which will allow us to mix stuff, some coming from the underlying OS and some ‘overridden’ just for our namespace.

Now, “container runtimes” like Docker, or LXC or runc are a lot about preparing an image of a filesystem that can be mounted in a way that a process could run. If you look at the OCI (open container initiative) it has two specs, one for this, the file system, and one for the runtime.

Page 39: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Isolating Inter-Process Communications

There is no container

39

With CLONE_NEWIPC we limit our processes capability to send and receive messages from processes to others with the same namespace;

We don’t want our nice isolated process to talk with strangers right?

Page 40: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

This is how when you run ps -aux you only see processes in your own namespace and its children (the pids won’t match. This is complex).

Oops, I forgot to tell you, namespaces are hierarchical. Which is triple fun. So yes containers can run inside other containers ad-infinitum (really up to 32 levels, but, well, you know, details).

Isolate Process IDs!

There is no container

40

Page 41: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

This is how your container gets its own IP. Yay, now is it a big boy.

(We won’t get into this.. but this is also where a lot of suffering will happen. Remember, from the Kernel

perspective this is just another interface. We will need either to use NAT, weird bridging or some creative

uses of IPTABLES to make sense thing. And this is clearly where we see how higher-level abstractions are a

necessity)

Isolating the Network

There is no container

41

Page 42: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

This is oh so important for unprivileged containers.

Yes! Linux supports doing all of this from userspace.

This basically means that the uid running inside does not exist outside. And that your process can feel blessedly aloof.

Isolate User and group IDs

There is no container

42

Page 43: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

man namespacesUSER_NAMESPACES(7)

There is no container

43

A process's user and group IDs can be different inside and outside a user

namespace. In particular, a process can have a normal unprivileged user ID

outside a user namespace while at the same time having a user ID of 0

inside the namespace; in other words, the process has full privileges for

operations inside the user namespace, but is unprivileged for operations

outside the namespace.

This means quantum-state rootness! You are root and unprivileged at the same time!

Page 44: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

man namespacesUSER_NAMESPACES(7)

There is no container

44

Each process is a member of exactly one user namespace. A process

created via fork or clone without the CLONE_NEWUSER flag is a member

of the same user namespace as its parent. A single-threaded process can

join another user namespace with setns if it has the CAP_SYS_ADMIN in

that namespace; upon doing so, it gains a full set of capabilities in that

namespace.

Page 45: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

This is where this ties in to the earlier mechanism we were talking about, cgroups.

CLONE_NEWCGROUP basically allows us to limit the resource usage of the process (and its children), in terms of memory, CPU usage and IO.

Almost last, but not least. Isolate resources!

There is no container

45

Page 46: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

This is of unholy complexity. Short story: Linux used to be mostly all or nothing . User 0 Vs the others. Now you have capabilities. A long list of capabilities. Which you can now go and set per process. And you have stuff like seccomp and seccomp-bpf to help you do just that

And you can use a bunch of modules and kernel patch sets to make everything more robust. Like SELinux. GRSecurity. Or AppArmor.

Really last: isolate all the things and the Kernel.

There is no container

46

Page 47: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

seccomp

There is no container

47

Seccomp is a mechanism in the Linux kernel that allows a process to make a one-way transition to a restricted state where it can only perform a limited set of system calls.

If a process attempts any other system calls, it is killed via a SIGKILL signal. In its most restrictive mode, seccomp prevents all system calls other than read(), write(), _exit(), and sigreturn().

This would allow a program to initialize and then drop into a restricted mode where it could only read from/write to already-opened files.

Page 48: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

seccomp-bpf

There is no container

48

If seccomp is a sledgehammer. seccomp-bpf is the fine-grained version that allows specifying a filter that is applied to every system call.

Page 49: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

BTW You get to have a nice pseudo filesystem with which you can interact to control these values.

try:

sudo ls -lai /proc/8/ns/

cat /proc/800/cgroups

Looking under the hood

There is no container

49

Page 50: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Unlike other isolation techniques (Solaris Zones, BSD Jails, VMs) this is an emergent thing

There is no container

50

This is not a “first class” citizen. This was not designed. Different projects assemble different types of isolation that have different semantics from all of these elements.

● Docker is about packaging a single executable ● LXC wants to give you what feels like a virtual machine.● FireJail is there as a sandbox to run stuff you don’t trust. GUI much.

And this is a recent thing, user namespaces appeared in Kernel release 3.8 on 18 Feb 2013

Page 51: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

There is no container

51

Page 52: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Quite far away from our intuitive abstraction popularized by Docker™

There is no container

52

Page 53: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Everything is this world is “race-condition” prone and much of it, because of the mix of tooling is complex and hard.

Creating a Linux Container or “containerization” is using these different mechanisms together in a coherent way so as to have the end result “feel” as if the process you are running in an isolated machine.

A container runtime is a packaging of the above to make it simple.

The signatures and semantics of cgroups, namespaces and seccomp are different.

There is no container

53

Page 54: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Container runtimes, try to take something that more reliably looks like this

There is no container

54

Page 55: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Into our abstract image

There is no container

55

Page 56: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

When you think about all these low-level knobs we can control: the machine name, the network interfaces, the file-system, the users etc… you see something else emerging.

When we define how to “containerize” a piece of software we are extracting its contract.

We are defining the minimal subset of resources it needs.

And what is the minimal understanding of that piece of software that the runtime requires to reliably run it.

Containers as an abstraction

There is no container

56

Page 57: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

There were other isolation techniques before Docker. But because it exposed such a simple contract it gained the incredible traction it had.

According to Docker the contract of a piece of software was:

● A base image (a state of a file-system). Itself can be layered.● A working directory.● A build step (which was basically a bash script).● A TCP port exposed to the world.● Environment variables.● A command to run.

The simple Docker Contract

There is no container

57

Page 58: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

The incredible success it had shows the Docker software, and the Docker contract were good enough; And good enough is good. Sometimes great.

At platform.sh we run a container based based PaaS and we chose not to use Docker.

● Partly because the nuts-and-bolts at the time didn’t fit (it was too new/buggy for production in 2013/14). No User namespaces until two months ago. No Immutability. Weird networking.

● Partly because we thought the contract wasn’t correct for our use-case.

Choosing a contract

There is no container

58

Page 59: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

● The idea of mutable, layered, base-images made creating the first generation of Docker containers easy. Which explains a lot of its popularity. So yes.

● But it is a messy thing. This is something Docker has advanced on by allowing immutable containers. Still the default is that the container is mutable. And this is how the eco-system looks like.

● Build-oriented, reproducible, semantic base-images allow for orders of magnitude better memory utilisation through deduplication; And order of magnitude simpler operations. This is not something you can bolt-on easily later. There is still strong inertia here.

Is it an efficient contract?

There is no container

59

Page 60: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

For some software (most software we cared about) this contract doesn’t really make sense. Not in the long run. Not at scale.

In order to be useful the contract that describes software needed also to describe:

○ How to build it○ Everything it depends on (you can’t run Wordpress without MySQL)○ Its initial data structures (you can’t run Wordpress without some data

in the MySQL)○ Its basic configuration (most software needs to understand some

things about its place in the world)

There is no container

60

Page 61: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

○ And first, of-course, the Kubernetes ecosystem.○ But using 30 different tools strung together doesn’t scream

“abstraction” to us, but more like DIY mess. And it hardly answers the questions:■ What is the minimal subset of resources an app needs?

■ How can we make it run, reliably?

These days there are a billion and one projects that add those capabilities

There is no container

61

Page 62: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

The obligatory XKCD 435

There is no container

62

○ If our intuition is correct, and the minimal viable contract to run “arbitrary” software contains these other things, if the useful level to reason about software is the molecule, not the atom then we need an Organic Chemistry set; Not a physics set.

○ It doesn’t mean physics are wrong. Or that Docker is bad software.

Page 63: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

● RO / immutable base-image that is not opaque○ A semantic representation of system-libraries (with lock files)○ A reproducible, semantic, build system (with lock files)○ Potentially, a build step (which can basically be a bash script).

● RW / mutable base-image (mutable state) - which is Content Addressable● Mapping of working directories to the RW image.● A list of exposed network protocols and their parameters● Build time environment variables / Run time environment variables● Relationships (some containers make no sense -- would not run without a

database) to other containers (that should be semantic themselves).● The capability to understand change (diff as part of the model).

What would be a perfect contract for us?

There is no container

63

Page 64: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

● Because we chose a container description system that did not depend on the containerization method we can swap-out that part later and this is domain where everything moves fast. Shiny new becomes legacy in 6 months.

○ Our reproducible build system can create our base LXC systems (we use in production) our VMs (which we also deploy when we need higher levels of isolation) or Docker images (which we use in our Gitlab based CI system).

● Because we went for Read-On Containers separated from the R/W mounts we have gained factors in terms of density because of the level of memory deduplication.

Why are abstractions important?

There is no container

64

Page 65: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

Why are abstractions important?● Because we are describing the “minimal application” not as a single process

but as a graph.. and because we understand the protocol layer interactions … and what writes where to disk .. we can have consistent operations over the cluster that are fast .. and safe.

● Which also means we do not suffer from the same limitations around running persistent services.

● It is easier to implement HA primitives when you understand who is writing to the disk and how, who has what ports opened etc..

● When your base system is not .yaml but .yaml + git and when your .yaml represents something that has meaning.. you can implement change with much less friction.

There is no container

65

Page 66: There is no container - Ori Pekelman

Platform.sh can clone a an arbitrarily complex production cluster in less than a minute.

With all of the data.

To create ephemeral staging clusters on the fly.

Every branch gets a url with basically fail-proof deployments.

Page 67: There is no container - Ori Pekelman

Git-driven infrastructure

With a single git push you can deploy an arbitrarily complex cluster (with micro-services, messages queues and the lot.)

Backup means a consistent point-in-time snapshot of the whole shebang.

Page 68: There is no container - Ori Pekelman

Automatically redundantarchitecture

High-Performance, automatic high-availability

Page 69: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

There is no container

69

Page 70: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

There is no container but the cluster

There is no container

70

● This is a bonus slide in case I didn’t run-out-of-time which is fun as I had 66 slides for 30 minutes.

● At the beginning of our project we used the word Cluster to describe, well half of the different primitives we had. But then it all became murky. So we started calling stuff Cluster, Kluster and Claster. Which stuck for a little bit but faded back again.

● Now cluster is back with all its glory, and a bit like with Hebrew, my mother’s tongue.. well, people seem just to be able to guess the correct meaning of cluster form the context.

● Oh we should really refresh that cluster.

Page 71: There is no container - Ori Pekelman

#ContainerDayFRParis Container Day 2017

I am @OriPekelman everywhere

There is no container

71

Questions ?