not so brief history of linux containers

Download Not so brief history of Linux Containers

If you can't read please download the document

Upload: kirill-kolyshkin

Post on 15-Apr-2017

583 views

Category:

Software


3 download

TRANSCRIPT

A brief history ofLinux Containers

Kir Kolyshkin ContainerCon, Seattle, 17th of August 2015

I like that this is a nested talk, it's like a novel within a book or story within a story. I don't like it's only 15 minutes, I got so much to tell you!

A (not so) brief history of(mostly) Linux Containers

Kir Kolyshkin ContainerCon, Seattle, 17th of August 2015

I like that this is a nested talk, it's like a novel within a book or story within a story. I don't like it's only 15 minutes, I got so much to tell you!

Evolution of OS

Single process batch processing multitask

Single user multiple users and groups

Single computer network of computers

Single userspace multiple userspaces
a.k.a. containers

What yeardo you thinkthe historyof Linux Containersstarted?

So, this is the first containercon. When do you think the history of containers started for Linux?

1999-2000

1999: Initial idea about Virtuozzovirtual environments groups of processes

a file system to share code / save RAM

resource management / isolation

2000: 5 engineers, public testing, 5000 VEs with root accounts, public source code release

Disclaimer: I work for Odin (ex Parallels, ex SWsoft), my POV is skewed.

Our chief scientist, a professor from MIPT (~ru MIT), Alexander Tormasov proposed a new direction to senior mgmt lightweight partitioning. He was inspired by IBM mainframe partitioning. The idea is to have multiple virtual environments, isolated groups of processes, each acting as a standalone Linux machine (except for the kernel shared). Another idea was about file system to share code (binaries/libraries) and therefore save RAM, making density even higher. Third cornerstone was resource isolation.In Feb 2000 they got an office in MIPT, 3 engineers, a sysadm, a manager/engineer. Later two guys for web mgmt tools. Initial public testing, hot summer 5000 VEs,

2000

User Beancounters:per process group limits

Andrey Savochkin and Alan Cox

barrier, limit, held, maxheld, failcnt

Al Viro: [mount] namespace

That initial testing revealed a big problem with resource isolation. A mathematician from MSU (~ru Stanford) hired, he wrote User Beancounters (with Alan Cox, luid idea from HP-UX). WARNING: PhD in economics!Also in 2000 Al Viro wrote a first namespace for Linux kernel the [mount] namespace. It's like chroot() but with bells and whistles. Kernel API is clone() call with CLONE_NEWNS flag.

2001

Virtuozzo for Windows!no source code lots of reverse engineering

live kernel patching

most advanced software ever written for Windows

Linux-VServer projectJacques Glinas, Herbert Ptzl

2002-2003

2002 Jan: First Virtuozzo release (v2.0)

2003: Meiosys Metaclustercontainers for the sake of live migration

acquired by IBM in 2005

2004-2005

Feb: Solaris Zones/Containers releasedkudos to Sun for the term containers!

Dec: first Virtuozzo for Windows release

CKRM, rsrc mgmt frmwrk frm IBM [FAIL]

2005: OpenVZ project announcedbetter late than never

2006-2010: up the stream!

Lots of new namespaces:PID (process tree)

net (net devices, addresses, routing etc)

IPC (shared memory, semaphores, msg queues)

UTS (hostname, kernel version)

Mount (filesystem mounts and files, 2000)

user (UIDs/GIDs, only completed in 2013, Linux 3.9)

Use: clone() with CLONE_NEW* flags

As a result of OpenVZ upstreaming efforts, a few more namespaces appeared in the Linux kernel. Most notable ones are netns and pidns. Netns was developed by OpenVZ kernel guys based on their experience with OVZ kernel but from scratch. Pidns were there two implementations, one from IBM, one from us, we won as ours had zero overhead on the first level of nesting.User namespace was all IBM work, and it was initially merged in 2.6.23 (2007), but was only completed (became usable) in Linux 3.9 (2013).We failed to upstream our User Beancounters, but Google contributed cgroups framework (it was an adaptation of cpusets feature from BULL/Silicon Graphics).As stuff become available in the kernel, userspace tools emerged. LXC is such a tool from IBM.

2006-2010: up the stream!

This time period was characterized by lots of container-related patches contributed to the Linux kernel, i.e. the upstreaming age. Our company is few hundred people, and our kernel team is only about 10 people, give or take, and I am very proud of the fact that this upstreaming effort made us appear in the top10 companies contributing to the Linux kernel. Well, it's the bottom of that top10, that is. Other companies in that list are way bigger.

Now, upstreaming is probably as complicated for developers as it is for salmons when they run. They die exhausted, they got eaten by grizzly bears, etc. On the right you can see a salmon, err, a developer, and on the left is a bear, err, a Linux kernel subsystem maintainer.

2006

Kernel ports: 2.6.15, FC5, RHEL4, 2.6.18

Weekend project ports to SPARC and Power

Live migration in OpenVZ

Checkpointing and Live Migration

Live migration, simplified:freeze processes, dump their complete state

copy that dump to other machine

restore from dump; unfreeze!

Initially implemented in the kerneltouches every subsystem (except drivers)

so, really hard to merge upstream

Trying hard to merge cpt/rst

This time period was characterized by lots of container-related patches contributed to the Linux kernel, i.e. the upstreaming age. Our company is few hundred people, and our kernel team is only about 10 people, give or take, and I am very proud of the fact that this upstreaming effort made us appear in the top10 companies contributing to the Linux kernel. Well, it's the bottom of that top10, that is. Other companies in that list are way bigger.

Now, upstreaming is probably as complicated for developers as it is for salmons when they run. They die exhausted, they got eaten by grizzly bears, etc. On the right you can see a salmon, err, a developer, and on the left is a bear, err, a Linux kernel subsystem maintainer.

2007

IBM AIX WPARs

HP-UX SRP containers

Rebase to RHEL5 kernel, port to 2.6.20

2007: cgroups framework from Google [PASS]based on cpusets feature from BULL/SGI

CGroups

Cgroups is a mechanism to control resources per hierarchical groups of processes

Modern alternative to user beancounters

Cgroups is nothing without controllers:blkio, cpu, cpuacct, cpuset, devices, freezer, memory, net_cls, net_prio

Cgroups are orthogonal to namespaces

Still working on it: just added kmem controller

2008-2009

Kernel port to 2.6.25

Weekend project port to ARM

LXC (userspace tool a la vzctl) was born

What is LXC?

From the first glance very similar to OpenVZ

In fact LXC is just a user space tool a la vzctl

LXC uses standard kernel

OpenVZ is a complete set with its own kernel, many tools, libraries etc.

A superset of OpenVZ also exists
as a commercial product (Virtuozzo)

What is LXC?

From the first glance very similar to OpenVZ

In fact LXC is just a user space tool a la vzctl

LXC uses standard kernel

OpenVZ is a complete set with its own kernel, many tools, libraries etc.

A superset of OpenVZ also exists
as a commercial product (Virtuozzo)

2010

Port to RHEL6

VSwap (RAM/swap limits, simplified UBC)

ploop aka CT filesystem in a fileon-demand allocation

instant snapshots

online resize, merge, compact

write tracker (improved live migration)

2011-2012: CRIU

Jul 2011: initial proposal for CRIU

Idea: implement most of
C/R in userspace
using existing APIs

Jul 2012: initial
CRIU release (v0.1)

criu.org

2013

Docker appeared

lmctfy appeared

CoreOS appeared

vzctl adds io/iops limit support

What is Docker?

Docker: containers runtime + app delivery + ...

Docker CTs are apps, OpenVZ CTs are systems

Extremely popular

Docker uses upstream/vanilla/standard kernel,
while OpenVZ provides a custom one

Docker is a middleware, OpenVZ is full stack

See also: CoreOS, Rocket

2014

CRIU for Docker & LXC support

LXD announced

OpenStack talks abt adding containers support

OpenVZ in 2015

New, more open development model

Unified with Virtuozzo

Plays well with Docker (in, out, and on the side)

Virtuozzo 7 is reboot of OpenVZ. Ten years ago we made a mistake of not having our devel process open enough, this time we are trying to fix it. This April we opened our next kernel git repo, and just this Monday we opened our toolchain. We also moved all of our discussions to the public mailing list, and we follow the git fork-branch-pull request model of developing for our tools.The other thing is next gen resource management. It's more dynamic, with a user-space daemon which would allow bursts, guarantees and in general more elastic limits.

CRIU in 2015

3 years old, tools at version 1.6.2

Users: Google, Samsung, Huawei, ...

LXC & Docker integrated!

TCP connection migration works!

About 160 patches merged to 3.x - 4.x kernels
under CONFIG_CHECKPOINT_RESTORE

Live migration: p.haul (criu.org/P.Haul)

Future!

Virtuozzo 7

4th gen of resource management: vcmmdMore dynamic, with bursts, guarantees etc

Proper port to POWER, ARM

CRIU: p.haul, integration (http://criu.org/Integration)

MetaPC? Mosaic?

Virtuozzo 7 is reboot of OpenVZ. Ten years ago we made a mistake of not having our devel process open enough, this time we are trying to fix it. This April we opened our next kernel git repo, and just this Monday we opened our toolchain. We also moved all of our discussions to the public mailing list, and we follow the git fork-branch-pull request model of developing for our tools.The other thing is next gen resource management. It's more dynamic, with a user-space daemon which would allow bursts, guarantees and in general more elastic limits.We will probably be working on a proper ARM and POWER ports (the improper ones were done by me years ago just to demonstrate that the containers technology is arch-agnostic). The only arch-dependent feature is CPT/RST as it requires deep knowledge of arch to develop. CRIU is ported to ARM currently.Finally, a MetaPC is something we're thinking about, a way to combine many servers into a single virtual big one. This is anti-partitioning, and it will work with the help of CRIU.


[email protected]@kolyshkin@_openvz_@__criu__openvz.org/ContactsBooth 333 (third floor, far right corner)