linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
DESCRIPTION
Slides presented @ OpenStack summit 2014 ATL for the "Linux Containers - NextGen Virtualization for Cloud" session. Thanks to all who attended.TRANSCRIPT
Linux Containers – NextGen Virtualization for CloudBoden Russell ([email protected])
OpenStack SummitMay 12 – 16, 2014Atlanta, Georgia
04/10/2023 © 2014 IBM Corporation 2
Definitions
Linux Containers (LXC LinuX Containers)– Lightweight virtualization– Realized using features provided by a modern Linux kernel – VMs without the hypervisor (kind of)
Containerization of– (Linux) Operating Systems– Single or multiple applications
LXC as a technology ≠ LXC “tools”
04/10/2023 © 2014 IBM Corporation 3
Hypervisors vs. Linux Containers
Hardware
Operating System
Hypervisor
Virtual Machine
Operating System
Bins / libs
App App
Virtual Machine
Operating System
Bins / libs
App App
Hardware
Hypervisor
Virtual Machine
Operating System
Bins / libs
App App
Virtual Machine
Operating System
Bins / libs
App App
Hardware
Operating System
Container
Bins / libs
App App
Container
Bins / libs
App App
Type 1 Hypervisor Type 2 Hypervisor Linux Containers
Containers share the OS kernel of the host and thus are lightweight.However, each container must have the same OS kernel.
Containers are isolated, but share OS and, where appropriate, libs / bins.
04/10/2023 © 2014 IBM Corporation 4
LXC Technology Stack
Use
r Spa
ceKe
rnel
Spa
ce
Kernel
System Call Interface
Architecture Dependent Kernel Code
GLIBC / Pseudo FS / User Space Tools & Libs
Linux Container Tooling
Linux Container Commoditization
Orchestration & Management
Hardware
cgro
ups
nam
espa
ces
chro
ots
LSM
lxc
04/10/2023 © 2014 IBM Corporation 5
So You Want To Build A Container?
High level checklist– Process(es)– Throttling / limits– Prioritization– Resource isolation– Root file system– Security
my-lxc
?
04/10/2023 © 2014 IBM Corporation 6
Linux Control Groups (cgroups)
Problem– How do I throttle, prioritize, control and obtain metrics for a group of
tasks (processes)?
Solution control groups (cgroups)
cgroup blue
proc
proc
proc
– Device Access– Resource limiting– Prioritization– Accounting– Control– Injection
04/10/2023 © 2014 IBM Corporation 7
Linux cgroup SubsystemsSubsystem Tunable Parameters
blkio - Weighted proportional block I/O access. Group wide or per device.- Per device hard limits on block I/O read/write specified as bytes per second or IOPS
per second.cpu - Time period (microseconds per second) a group should have CPU access.
- Group wide upper limit on CPU time per second.- Weighted proportional value of relative CPU time for a group.
cpuset - CPUs (cores) the group can access.- Memory nodes the group can access and migrate ability.- Memory hardwall, pressure, spread, etc.
devices - Define which devices and access type a group can use.
freezer - Suspend/resume group tasks.
memory - Max memory limits for the group (in bytes).- Memory swappiness, OOM control, hierarchy, etc..
hugetlb - Limit HugeTLB size usage.- Per cgroup HugeTLB metrics.
net_cls - Tag network packets with a class ID.- Use tc to prioritize tagged packets.
net_prio - Weighted proportional priority on egress traffic (per interface).
04/10/2023 © 2014 IBM Corporation 8
Linux cgroups Pseudo FS Interface
/sys/fs/cgroup/my-lxc
|-- blkio| |-- blkio.io_merged| |-- blkio.io_queued| |-- blkio.io_service_bytes| |-- blkio.io_serviced| |-- blkio.io_service_time| |-- blkio.io_wait_time| |-- blkio.reset_stats| |-- blkio.sectors| |-- blkio.throttle.io_service_bytes| |-- blkio.throttle.io_serviced| |-- blkio.throttle.read_bps_device| |-- blkio.throttle.read_iops_device| |-- blkio.throttle.write_bps_device| |-- blkio.throttle.write_iops_device| |-- blkio.time| |-- blkio.weight| |-- blkio.weight_device| |-- cgroup.clone_children| |-- cgroup.event_control| |-- cgroup.procs| |-- notify_on_release| |-- release_agent| `-- tasks|-- cpu| |-- ...|-- ...`-- perf_event
echo "8:16 1048576“ > blkio.throttle.read_bps_devic
e
cat blkio.weight_devicedev weight8:1 2008:16 500 App
App
App
Linux pseudo FS is the interface to cgroups– Directory per subsystem per cgroup– Read / write to pseudo file(s) in your cgroup directory
04/10/2023 © 2014 IBM Corporation 9
Linux cgroups FS Layout
04/10/2023 © 2014 IBM Corporation 10
So You Want To Build A Container?
04/10/2023 © 2014 IBM Corporation
Linux namespaces
Problem– How do I provide an isolated view of global resources to a group of tasks
(processes)?
Solution namespaces
11
namespace blue
– MNT; mount points, files systems, etc.
– PID; processes– NET; NICs, routing, etc.– IPC; System V IPC– UTS; host and domain name– USER; UID and GID
MNTPIDNETUTSUSER
proc
proc
proc
04/10/2023 © 2014 IBM Corporation 12
Linux namespaces: Conceptual Overview
global (i.e. root) namespace
MNT NS//proc/mnt/fsrd/mnt/fsrw/mnt/cdrom/run2
UTS NSglobalhostrootns.com
PID NSPID COMMAND1 /sbin/init2 [kthreadd]3 [ksoftirqd]4 [cpuset]5 /sbin/udevd6 /bin/sh7 /bin/bash
IPC NSSHMID OWNER32452 root43321 boden
SEMID OWNER0 root1 Boden
MSQID OWNER
NET NSlo: UNKNOWN…eth0: UP…eth1: UP…br0: UP…
app1 IP:5000app2 IP:6000app3 IP:7000
USER NSroot 0:0ntp 104:109mysql 105:110boden 106:111
purple namespace
MNT NS//proc/mnt/purplenfs/mnt/fsrw/mnt/cdrom
UTS NSpurplehostpurplens.com
PID NSPID COMMAND1 /bin/bash2 /bin/vim
IPC NSSHMID OWNER
SEMID OWNER0 root
MSQID OWNER
NET NSlo: UNKNOWN…eth0: UP…
app1 IP:1000app2 IP:7000
USER NSroot 0:0app 106:111
blue namespace
MNT NS//proc/mnt/cdrom/bluens
UTS NSbluehostbluens.com
PID NSPID COMMAND1 /bin/bash2 python3 node
IPC NSSHMID OWNER
SEMID OWNER
MSQID OWNER
NET NSlo: UNKNOWN…eth0: DOWN…eth1: UP
app1 IP:7000app2 IP:9000
USER NSroot 0:0app 104:109
04/10/2023 © 2014 IBM Corporation 13
Linux namespaces & cgroups: Availability
Note: user namespace support in upstream kernel 3.8+, but distributions rolling out phased support:- Map LXC UID/GID between
container and host- Non-root LXC creation
04/10/2023 © 2014 IBM Corporation 14
So You Want To Build A Container?
04/10/2023 © 2014 IBM Corporation 15
Linux chroot & pivot_root Using pivot_root with MNT namespace addresses escaping chroot
concerns The pivot_root target directory becomes the “new root FS”
04/10/2023 © 2014 IBM Corporation 16
So You Want To Build A Container?
04/10/2023 17
Linux Security Modules & MAC Linux Security Modules (LSM) – kernel modules which provide a
framework for Mandatory Access Control (MAC) security implementations MAC vs DAC
– In MAC, admin (user or process) assigns access controls to subject / initiator– In DAC, resource owner (user) assigns access controls to individual resources
Existing LSM implementations include: AppArmor, SELinux, GRSEC, etc.
04/10/2023 © 2014 IBM Corporation 18
Linux Capabilities
Per process privileges which define sys call access
Can be assigned to LXC process(es)
04/10/2023 © 2014 IBM Corporation 19
Other Security Measures
Reduce shared FS access using RO bind mounts Linux seccomp
– Confine system calls Keep Linux kernel up to date User namespaces in 3.8+ kernel
– Launching containers as non-root user– Mapping UID / GID into container
04/10/2023 © 2014 IBM Corporation 20
So You Want To Build A Container?
04/10/2023 © 2014 IBM Corporation 21
LXC Industry ToolingVirtuozzo OpenVZ Linux
VServerLibvirt-lxc Lxc (tools) Warden lmctfy Docker
Summary Commerical product using OpenVZ under the hood
Custom Kernel providing well seasoned LXC support
A set of kernel patches providing LXC. Not based on cgroups or namespaces.
Libvirt support for LXC via cgroups and namespaces.
Lib + set of user spaces tools /bindings for LXC.
LXC management tooling used by CF.
Similar to LXC, but provides more intent based focus.
Commoditization of LXC adding support for images, build files, etc.
Part of upstream Kernel?
No No Partial Yes Yes Yes Yes, but additional patches needed for specific features.
Yes
License Commercial GNU GPL v2 GNU GPL v2 GNU LGPL GNU LGPL Apache v2 Apache v2 Apache v2
APIs / Bindings
- CLI- API
- CLI- C
- CLI- C- Python- Java- C#- PHP
- Python- Lua- GO- CLI
- GO- REST- CLI- Python- Other 3rd
party libs
Management plane/ Dashboard
Virtuozzo Parrallels
Virtuozzo Parrallels + others
- OpenStack- Archipel- Virt-
Manager
- LXC web panel
- Lexy
- OpenStack- Shipyard- Docker UI
04/10/2023 © 2014 IBM Corporation 22
LXC Orchestration & Management Docker & libvirt-lxc in OpenStack
– Manage containers heterogeneously with traditional VMs… but not w/the level of support & features we might like
CoreOS– Zero-touch admin Linux distro with docker images as the unit of operation– Centralized key/value store to coordinate distributed environment
Various other 3rd party apps– Maestro for docker– Shipyard for docker– Fleet for CoreOS– Etc.
LXC migration– Container migration via criu
But…– Still no great way to tie all virtual resources together with LXC – e.g. storage +
networking• IMO; an area which needs focus for LXC to become more generally applicable
04/10/2023 © 2014 IBM Corporation 23
CLOUDY BENCHMARKING WITH KVM, DOCKER AND OPENSTACK
04/10/2023 © 2014 IBM Corporation 24
Benchmark Environment Topology @ SoftLayer
glance api / reg
nova api / cond / etc
keystone
…
rally
nova api / cond / etc
cinder api / sch / vol
docker lxc
dstat
controller compute node
glance api / reg
nova api / cond / etc
keystone
…
rally
nova api / cond / etc
cinder api / sch / vol
KVM
dstat
controller compute node
+Awesome!
+Awesome!
04/10/2023 Document v2.0 25
Cloudy Performance: Steady State Packing Benchmark scenario overview
– Pre-cache VM image on compute node prior to test– Boot 15 VM asynchronously in succession– Wait for 5 minutes (to achieve steady-state on the
compute node)– Delete all 15 VMs asynchronously in succession
Benchmark driver– cpu_bench.py
High level goals– Understand compute node characteristics under
steady-state conditions with 15 packed / active VMs
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 4702468
10121416
Benchmark Visualization
VMs
Time
Activ
e VM
s
04/10/2023 Document v2.0 26
Cloudy Performance: Serial VM Boot Benchmark scenario overview
– Pre-cache VM image on compute node prior to test– Boot VM– Wait for VM to become ACTIVE– Repeat the above steps for a total of 15 VMs– Delete all VMs
Benchmark driver– OpenStack Rally
High level goals– Understand compute node characteristics under
sustained VM boots
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2002468
10121416
Benchmark Visualization
VMs
Time
Activ
e VM
s
04/10/2023 Document v2.0 27
Cloudy Performance: Serial VM Reboot Benchmark scenario overview
– Pre-cache VM image on compute node prior to test– Boot a VM & wait for it to become ACTIVE– Soft reboot the VM and wait for it to become ACTIVE
• Repeat reboot a total of 5 times– Delete VM– Repeat the above for a total of 5 VMs
Benchmark driver– OpenStack Rally
High level goals– Understand compute node characteristics under sustained VM reboots
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 550
1
2
3
4
5
6
Benchmark Visualization
Active VMs
Time
Activ
e VM
s
04/10/2023 © 2014 IBM Corporation 28
Cloudy Performance: Snapshot VM To Image
Benchmark scenario overview– Boot a VM– Wait for it to become active– Snapshot the VM– Wait for image to become active– Delete VM
04/10/2023 Document v2.0 29
Cloudy Ops: VM Boot
docker KVM0
1
2
3
4
5
6
7
3.52911310196
5.78166244825
Average Server Boot Time
Series1
Tim
e In
Sec
onds
04/10/2023 Document v2.0 30
Cloudy Ops: VM Reboot
docker KVM0
20
40
60
80
100
120
140
2.57787958145
124.433238959
Average Server Reboot Time
Series1
Tim
e In
Sec
onds
04/10/2023 Document v2.0 31
Cloudy Ops: VM Delete
docker KVM0
0.5
1
1.5
2
2.5
3
3.5
4
3.56758604053.47976005077
Average Server Delete Time
Series1
Tim
e In
Sec
onds
04/10/2023 Document v2.0 32
Cloudy Ops: VM Snapshot
docker KVM0
10
20
30
40
50
60
36.8875639439
48.0231380463
Average Snapshot Server Time
Series1
Tim
e In
Sec
onds
04/10/2023 Document v2.0 33
Cloudy Performance: Steady State Packing
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 291 301 311 3210
1020304050607080
Docker: Compute Node CPU (full test duration)
usrsys
Time
CPU
Usag
e In
Per
cent
Averages
– 0.54
– 0.17
1 11 21 31 41 51 61 71 81 91 1011111211311411511611711811912012112212312412512612712812913013113213313410
1020304050607080
KVM: Compute Node CPU (full test duration)
usrsys
Time
CPU
Usag
e In
Per
cent
Averages
– 7.64
– 1.4
04/10/2023 Document v2.0 34
Cloudy Performance: Steady State Packing
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169 176 183 190 197 204 2110
2
4
6
8
10
12
14
Docker: Compute Node Steady-State CPU (segment: 31s – 243s)
usrsys
Time (31s – 243s)
CPU
Usag
e In
Per
cent
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169 176 183 190 197 204 2110
2
4
6
8
10
12
14
KVM: Compute Node Steady-State CPU (segment: 95s – 307s)
usrsys
Time (95s - 307s)
CPU
Usag
e In
Per
cent
Averages
– 0.2
– 0.03
Averages
– 1.91
– 0.36
31 seconds243 seconds
95 seconds307 seconds
04/10/2023 Document v2.0 35
Cloudy Performance: Steady State Packing
1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289 301 313 325 3370.00E+00
1.00E+09
2.00E+09
3.00E+09
4.00E+09
5.00E+09
6.00E+09
7.00E+09
Docker / KVM: Compute Node Used Memory (Overlay)
kvmdocker
Axis Title
Mem
ory
Used
dockerDelta734 MBPer VM49 MB
KVMDelta4387 MBPer VM292 MB
04/10/2023 Document v2.0 36
Cloudy Performance: Serial VM Boot
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 7905
101520253035
Docker: Compute Node CPU
usrsys
Time
CPU
Usag
e In
Per
cent
Averages
– 1.39
– 0.57
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 1250
5
10
15
20
25
30
35
KVM: Compute Node CPU Usage
usrsys
Time
CPU
Usag
e In
Per
cent
Averages
– 13.45
– 2.23
04/10/2023 Document v2.0 37
Cloudy Performance: Serial VM Boot
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 510
5
10
15
20
25
f(x) = 0.00948850678733032 x + 1.00804392156863
f(x) = 0.358234479638009 x + 1.0632956862745
Docker / KVM: Serial VM Boot Usr CPU (segment: 8s - 58s)
docker(8-58)Linear (docker(8-58))kvm(8-58)Linear (kvm(8-58))
Time (8s - 58s)
Usr C
PU In
Per
cent
8 seconds 58 seconds
04/10/2023 Document v2.0 38
Cloudy Performance: Serial VM Boot
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 1260.00E+00
5.00E+08
1.00E+09
1.50E+09
2.00E+09
2.50E+09
3.00E+09
3.50E+09
4.00E+09
4.50E+09
5.00E+09
Docker / KVM: Compute Node Memory Used (Unnormalized Overlay)
kvmdocker
Time
Mem
ory
Used
04/10/2023 Document v2.0 39
Cloudy Performance: Serial VM Boot
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 650.00E+00
5.00E+08
1.00E+09
1.50E+09
2.00E+09
2.50E+09
3.00E+09
3.50E+09
f(x) = 11773408.1342657 x + 1449606116.43077
f(x) = 29765955.3118881 x + 1178597198.76923
Docker / KVM: Serial VM Boot Memory Usage (segment: 1s - 67s)
dockerLinear (docker)kvmLinear (kvm)
Time (1s - 67s)
Mem
ory
Usag
e
1 second 67 seconds
04/10/2023 Document v2.0 40
Guest Ops: Network
docker KVM0
100
200
300
400
500
600
700
800
900
1000940.26 940.56
Network Throughput
Series1
Thro
ughp
ut In
10^
6 bi
ts/s
econ
d
04/10/2023 41
Guest Ops: Near Bare Metal Performance
Typical docker LXC performance near par with bare metal
linpack performance @ 45000
0
50
100
150
200
250
vcpus
GF
lop
s
220.77Bare metal220.5
@32 vcpu
220.9@ 31 vcpu
MEMCPY DUMB MCBLOCK0
2000
4000
6000
8000
10000
12000
14000
Memory Benchmark Performance
Bare Metal (MiB/s)docker (MiB/s)KVM (MiB/s)
Memory Test
MiB
/s
04/10/2023 Document v2.0 42
Guest Ops: File I/O Random Read / Write
1 2 4 8 16 32 640
200
400
600
800
1000
1200
1400
1600
Sysbench Synchronous File I/O Random Read/Write @ R/W Ratio of 1.50
dockerKVM
Threads
Tota
l Tra
nsfe
rred
In K
b/se
c
04/10/2023 Document v2.0 43
Guest Ops: MySQL OLTP
1 2 4 8 16 32 640
2000
4000
6000
8000
10000
12000
14000
MySQL OLTP Random Transactional R/W (60s)
dockerKVM
Threads
Tota
l Tra
nsac
tions
04/10/2023 Document v2.0 44
Guest Ops: MySQL Indexed Insertion
100000 200000 300000 400000 500000 600000 700000 800000 900000 10000000
20
40
60
80
100
120
140
MySQL Indexed Insertion @ 100K Intervals
dockerkvm
Table Size In Rows
Seco
nds P
er 1
00K
Inse
rtion
Bat
ch
04/10/2023 Document v2.0 45
Cloud Management Impacts on LXC
docker cli nova-docker0
0.5
1
1.5
2
2.5
3
3.5
4
0.17
3.52911310196
Docker: Boot Container - CLI vs Nova Virt
Series1
Seco
nds
Cloud management often caps true ops performance of LXC
04/10/2023 Document v2.0 46
Ubuntu MySQL Image Size
docker kvm0
200
400
600
800
1000
1200
381.5
1080
Docker / KVM: Ubuntu MySQL
Series1
Size
In M
B
Out of the box JeOS images for docker are lightweight
04/10/2023 47
LXC In Summary
Near bare metal performance in the guest Fast operations in the Cloud Reduced resource consumption (CPU, MEM) on the compute
node Out of the box smaller image footprint
04/10/2023 48
LXC Gaps
There are gaps…
Lack of industry tooling / support Live migration still a WIP Full orchestration across resources (compute / storage / networking) Fears of security Not a well known technology… yet Integration with existing virtualization and Cloud tooling Not much / any industry standards Missing skillset Slower upstream support due to kernel dev process Memory /CPU proc FS not cgroup aware Etc.
04/10/2023 49
References & Related Links http://www.slideshare.net/BodenRussell/realizing-linux-containerslxc http://bodenr.blogspot.com/2014/05/kvm-and-docker-lxc-benchmarking-with.htm
l https://www.docker.io/ http://sysbench.sourceforge.net/ http://dag.wiee.rs/home-made/dstat/ http://www.openstack.org/ https://wiki.openstack.org/wiki/Rally https://wiki.openstack.org/wiki/Docker http://devstack.org/ http://www.linux-kvm.org/page/Main_Page https://github.com/stackforge/nova-docker https://github.com/dotcloud/docker-registry http://www.netperf.org/netperf/ http://www.tokutek.com/products/iibench/ http://www.brendangregg.com/activebenchmarking.html http://wiki.openvz.org/Performance
04/10/2023 © 2014 IBM Corporation 50
IBM Sponsored SessionsMonday, May 12 – Room B314
12:05-12:45
Wednesday, May 14 - Room B312
9:00-9:40
9:50-10:30
11:00-11:40
11:50-12:30
OpenStack is Rockin’ the OpenCloud Movement! Who‘s Next to Join the Band ?Angel Diaz, VP Open Technology and Cloud LabsDavid Lindquist, IBM Fellow, VP, CTO Cloud & Smarter Infrastructure
Getting from enterprise ready to enterprise bliss - why OpenStack and IBM is a match made in Cloud heaven. Todd Moore - Director, Open Technologies and Partnerships
Taking OpenStack beyond Infrastructure with IBM SmartCloud Orchestrator.Andrew Trossman - Distinguished Engineer, IBM Common Cloud Stack and SmartCloud Orchestrator
IBM, SoftLayer and OpenStack - present and futureMichael Fork - Cloud Architect
IBM and OpenStack: Enabling Enterprise Cloud Solutions Now.Tammy Van Hove -Distinguished Engineer, Software Defined Systems
04/10/2023 © 2014 IBM Corporation 51
IBM Technical SessionsMonday, May 12
3:40 - 4:20
3:40 - 4:20
Tuesday, May 13
11:15 - 11:55
2:00 - 2:40
5:30 - 6:10
5:30 - 6:10
Wednesday, May14
9:50 - 10:30
2:40 - 3:20
Thursday, May 15
9:50 - 10:30
1:30 - 2:10
2:20 - 3:00
04/10/2023 © 2014 IBM Corporation 52
Be sure to stop by the IBM booth to see some demos and get your rockin’ OpenStack t-shirt while they last.
Don’t miss Monday evening’s booth crawl where you can enjoy Atlanta’s own SWEET WATER IPA!
Thank you!