breaking the rpidocker challenge
TRANSCRIPT
Breaking the RPiDocker Challenge
Nicolas De LoofYoann Dubreuil Damien Duportal
RPiDockerChallenge
3
—Author Name
“Let’s break the challenge.”
4
Methodology
“Measure and automate all the things.”
Damien Duportal@DamienDuportal
1 - Measure and automate all the things
Measures :
● sysstat for post mortem● node-collector from Prometheus.io for “real time”
Provisionning :
● Basic shell script published on Damien’s Github
Yoann Dubreuil@YoannDubreuil
“Brainstorm for ideas, then test everything in arbitrary order”
Nicolas De loof@ndeloof
“... and have some beer”
Nicolas & Yoann : Where to start ?
● first naïve try○ only 38 containers :-\ ○ but 70 on a RPi1 #WTF?
● figure out RPi2 limits without Docker○ web server footprint○ network namespace footprint
● get some help !○ let’s collaborate with @DamienDuportal (aka “French mafia”)
2 - Systemd tuning
Docker daemon run as root
… but still has some limits set by systemd (so the 38 containers...)
LimitSIGPENDING=infinity
LimitNOFILE=infinity
LimitAS=infinity
LimitNPROC=infinity
LimitSTACK=?
● Default stack size is 8Mb○ a stack consume 8Mb of process VM space (8 * 4 * 38 = 1,2 Gb)
=> tweak LIMITSTACK for ~ 1800 / 2000 containers
3 - Lower the container footprint
● Tried with custom compiled nginx for ARM with few extensions
~ 80 containers
● Footprint is too big per container. Reading carefully Hypriot Blog : "rpi-nano-httpd" : written in ARM assembly code, already highly optimized➢ 1 page for code➢ 1 page for data➢ 1 page for stack➢ 1 page for vsdo
=> 16kb memory footprint per process !
~150 containers
● launched 27.000 on a RPi2
network namespace RPi2 limit
● launched web server in a dedicated network namespace
ip netns exec <NS_NUMBER> httpd
● RPi2 limit is ~ 1.100 network namespace
=> To break the challenge, we needed to run without network isolation
--net=host
Reached ~ 1000 containers
4 - Speed up testing !
launching thousands of containers on a RPi2 takes hours if not days!
● everything in memory with zram devices○ swap (ratio 5:1)○ /var/lib/docker on ext4 FS (ratio 10:1)
● swap as early as possible to keep free memory (vm.swappiness = 100)● more CPU for GO with GOMAXPROCS=4● reduce kernel perf event slowdown
○ kernel.perf_cpu_time_max_percent = 1● USB external disk vs low perf, I/O limited SD card
5 - Docker tuning
● Disable proxy process : no use here● No logging : --log-driver=none● Disable network / port forwarding
--bridge=none --iptables=false --ipv6=false --ip-forward=false --ip-masq=false --userland-proxy=false -sig-proxy=false
● reduce Golang memory consumption○ launched docker with GODEBUG=gctrace=1 GOGC=1
6 - System tuning
● limit memory consumption○ reduce GPU memory to 16Mb (can’t do less)○ blacklisted non required Linux modules
● remove some Linux limits○ vm.overcommit = 1○ kernel.pid_max = 32768○ kernel.threads-max = 14812
● reduce thread stack size○ smallest working thread stack size: 24kb
●
Did not work
● Btrfs○ not working properly : strange web server 404 failures after ~20
successful launchs○ stick with overlayfs
● LXC driver○ way sloooooooower○ 4 threads per container anyway
● Go 1.5○ compiled Docker with Go 1.5 for “better GC”, had no significant impact
Challenge Completed
● We started 2499 containers !
● RAM on RPi2 was not exhausted but Docker daemon crashed
docker[307]: runtime: program exceeds 10000-thread limit
Why is there a limit ?
4 threads per container ● 10.000 threads for a Go application => 2500 containers max
Need to understand why Docker do need 4 threads per container(hey, lot’s of Docker core contributors here, time to ask !)
Worked around this with runtime.debug.SetMaxThread(12000)
● hack not eligible for RpiDocker challenge, was just to confirm
● can run ~2740 webserver containers, before actual OOM
“Collaboration (and beer) were the keys to break this
challenge !.”
Thank you!@ndeloof @YoannDubreuil @DamienDuportal