not so brief history of linux containers
TRANSCRIPT
A brief history ofLinux Containers
Kir Kolyshkin ContainerCon, Seattle, 17th of August 2015
I like that this is a nested talk, it's like a novel within a book or story within a story. I don't like it's only 15 minutes, I got so much to tell you!
A (not so) brief history of(mostly) Linux Containers
Kir Kolyshkin ContainerCon, Seattle, 17th of August 2015
I like that this is a nested talk, it's like a novel within a book or story within a story. I don't like it's only 15 minutes, I got so much to tell you!
Evolution of OS
Single process batch processing multitask
Single user multiple users and groups
Single computer network of computers
Single userspace multiple userspaces
a.k.a. containers
What yeardo you thinkthe historyof Linux Containersstarted?
So, this is the first containercon. When do you think the history of containers started for Linux?
1999-2000
1999: Initial idea about Virtuozzovirtual environments groups of processes
a file system to share code / save RAM
resource management / isolation
2000: 5 engineers, public testing, 5000 VEs with root accounts, public source code release
Disclaimer: I work for Odin (ex Parallels, ex SWsoft), my POV is
skewed.
Our chief scientist, a professor from MIPT (~ru MIT), Alexander Tormasov proposed a new direction to senior mgmt lightweight partitioning. He was inspired by IBM mainframe partitioning. The idea is to have multiple virtual environments, isolated groups of processes, each acting as a standalone Linux machine (except for the kernel shared). Another idea was about file system to share code (binaries/libraries) and therefore save RAM, making density even higher. Third cornerstone was resource isolation.In Feb 2000 they got an office in MIPT, 3 engineers, a sysadm, a manager/engineer. Later two guys for web mgmt tools. Initial public testing, hot summer 5000 VEs,
2000
User Beancounters:per process group limits
Andrey Savochkin and Alan Cox
barrier, limit, held, maxheld, failcnt
Al Viro: [mount] namespace
That initial testing revealed a big problem with resource isolation. A mathematician from MSU (~ru Stanford) hired, he wrote User Beancounters (with Alan Cox, luid idea from HP-UX). WARNING: PhD in economics!Also in 2000 Al Viro wrote a first namespace for Linux kernel the [mount] namespace. It's like chroot() but with bells and whistles. Kernel API is clone() call with CLONE_NEWNS flag.
2001
Virtuozzo for Windows!no source code lots of reverse engineering
live kernel patching
most advanced software ever written for Windows
Linux-VServer projectJacques Glinas, Herbert Ptzl
2002-2003
2002 Jan: First Virtuozzo release (v2.0)
2003: Meiosys Metaclustercontainers for the sake of live migration
acquired by IBM in 2005
2004-2005
Feb: Solaris Zones/Containers releasedkudos to Sun for the term containers!
Dec: first Virtuozzo for Windows release
CKRM, rsrc mgmt frmwrk frm IBM [FAIL]
2005: OpenVZ project announcedbetter late than never
2006-2010: up the stream!
Lots of new namespaces:PID (process tree)
net (net devices, addresses, routing etc)
IPC (shared memory, semaphores, msg queues)
UTS (hostname, kernel version)
Mount (filesystem mounts and files, 2000)
user (UIDs/GIDs, only completed in 2013, Linux 3.9)
Use: clone() with CLONE_NEW* flags
As a result of OpenVZ upstreaming efforts, a few more namespaces appeared in the Linux kernel. Most notable ones are netns and pidns. Netns was developed by OpenVZ kernel guys based on their experience with OVZ kernel but from scratch. Pidns were there two implementations, one from IBM, one from us, we won as ours had zero overhead on the first level of nesting.User namespace was all IBM work, and it was initially merged in 2.6.23 (2007), but was only completed (became usable) in Linux 3.9 (2013).We failed to upstream our User Beancounters, but Google contributed cgroups framework (it was an adaptation of cpusets feature from BULL/Silicon Graphics).As stuff become available in the kernel, userspace tools emerged. LXC is such a tool from IBM.
2006-2010: up the stream!
This time period was characterized by lots of container-related patches contributed to the Linux kernel, i.e. the upstreaming age. Our company is few hundred people, and our kernel team is only about 10 people, give or take, and I am very proud of the fact that this upstreaming effort made us appear in the top10 companies contributing to the Linux kernel. Well, it's the bottom of that top10, that is. Other companies in that list are way bigger.
Now, upstreaming is probably as complicated for developers as it is for salmons when they run. They die exhausted, they got eaten by grizzly bears, etc. On the right you can see a salmon, err, a developer, and on the left is a bear, err, a Linux kernel subsystem maintainer.
2006
Kernel ports: 2.6.15, FC5, RHEL4, 2.6.18
Weekend project ports to SPARC and Power
Live migration in OpenVZ
Checkpointing and Live Migration
Live migration, simplified:freeze processes, dump their complete state
copy that dump to other machine
restore from dump; unfreeze!
Initially implemented in the kerneltouches every subsystem (except drivers)
so, really hard to merge upstream
Trying hard to merge cpt/rst
This time period was characterized by lots of container-related patches contributed to the Linux kernel, i.e. the upstreaming age. Our company is few hundred people, and our kernel team is only about 10 people, give or take, and I am very proud of the fact that this upstreaming effort made us appear in the top10 companies contributing to the Linux kernel. Well, it's the bottom of that top10, that is. Other companies in that list are way bigger.
Now, upstreaming is probably as complicated for developers as it is for salmons when they run. They die exhausted, they got eaten by grizzly bears, etc. On the right you can see a salmon, err, a developer, and on the left is a bear, err, a Linux kernel subsystem maintainer.
2007
IBM AIX WPARs
HP-UX SRP containers
Rebase to RHEL5 kernel, port to 2.6.20
2007: cgroups framework from Google [PASS]based on cpusets feature from BULL/SGI
CGroups
Cgroups is a mechanism to control resources per hierarchical groups of processes
Modern alternative to user beancounters
Cgroups is nothing without controllers:blkio, cpu, cpuacct, cpuset, devices, freezer, memory, net_cls, net_prio
Cgroups are orthogonal to namespaces
Still working on it: just added kmem controller
2008-2009
Kernel port to 2.6.25
Weekend project port to ARM
LXC (userspace tool a la vzctl) was born
What is LXC?
From the first glance very similar to OpenVZ
In fact LXC is just a user space tool a la vzctl
LXC uses standard kernel
OpenVZ is a complete set with its own kernel, many tools, libraries etc.
A superset of OpenVZ also exists
as a commercial product (Virtuozzo)
What is LXC?
From the first glance very similar to OpenVZ
In fact LXC is just a user space tool a la vzctl
LXC uses standard kernel
OpenVZ is a complete set with its own kernel, many tools, libraries etc.
A superset of OpenVZ also exists
as a commercial product (Virtuozzo)
2010
Port to RHEL6
VSwap (RAM/swap limits, simplified UBC)
ploop aka CT filesystem in a fileon-demand allocation
instant snapshots
online resize, merge, compact
write tracker (improved live migration)
2011-2012: CRIU
Jul 2011: initial proposal for CRIU
Idea: implement most of
C/R in userspace
using existing APIs
Jul 2012: initial
CRIU release (v0.1)
criu.org
2013
Docker appeared
lmctfy appeared
CoreOS appeared
vzctl adds io/iops limit support
What is Docker?
Docker: containers runtime + app delivery + ...
Docker CTs are apps, OpenVZ CTs are systems
Extremely popular
Docker uses upstream/vanilla/standard kernel,
while OpenVZ provides a custom one
Docker is a middleware, OpenVZ is full stack
See also: CoreOS, Rocket
2014
CRIU for Docker & LXC support
LXD announced
OpenStack talks abt adding containers support
OpenVZ in 2015
New, more open development model
Unified with Virtuozzo
Plays well with Docker (in, out, and on the side)
Virtuozzo 7 is reboot of OpenVZ. Ten years ago we made a mistake of not having our devel process open enough, this time we are trying to fix it. This April we opened our next kernel git repo, and just this Monday we opened our toolchain. We also moved all of our discussions to the public mailing list, and we follow the git fork-branch-pull request model of developing for our tools.The other thing is next gen resource management. It's more dynamic, with a user-space daemon which would allow bursts, guarantees and in general more elastic limits.
CRIU in 2015
3 years old, tools at version 1.6.2
Users: Google, Samsung, Huawei, ...
LXC & Docker integrated!
TCP connection migration works!
About 160 patches merged to 3.x - 4.x kernels
under CONFIG_CHECKPOINT_RESTORE
Live migration: p.haul (criu.org/P.Haul)
Future!
Virtuozzo 7
4th gen of resource management: vcmmdMore dynamic, with bursts, guarantees etc
Proper port to POWER, ARM
CRIU: p.haul, integration (http://criu.org/Integration)
MetaPC? Mosaic?
Virtuozzo 7 is reboot of OpenVZ. Ten years ago we made a mistake of not having our devel process open enough, this time we are trying to fix it. This April we opened our next kernel git repo, and just this Monday we opened our toolchain. We also moved all of our discussions to the public mailing list, and we follow the git fork-branch-pull request model of developing for our tools.The other thing is next gen resource management. It's more dynamic, with a user-space daemon which would allow bursts, guarantees and in general more elastic limits.We will probably be working on a proper ARM and POWER ports (the improper ones were done by me years ago just to demonstrate that the containers technology is arch-agnostic). The only arch-dependent feature is CPT/RST as it requires deep knowledge of arch to develop. CRIU is ported to ARM currently.Finally, a MetaPC is something we're thinking about, a way to combine many servers into a single virtual big one. This is anti-partitioning, and it will work with the help of CRIU.
[email protected]@kolyshkin@_openvz_@__criu__openvz.org/ContactsBooth
333 (third floor, far right corner)