siteground tech teambuilding
TRANSCRIPT
What do we have?
● cpuset - whole cores and cpu mapping● cpuacct - cpu cycle accounting● cpu - less then core granularity● memory - limits and accounting● blkio - limits and accounting● net_cls - network classification ● net_prio - network priority● Freezer + checkpoint/restore - migration
General structure
● tasks– attach a task(thread) and show
list of threads
● cgroup.procs– show list of processes
# mount -t cgroup none /cgroups
# mount -t cgroup -o cpuset cpuset /cg/cpuset
How to use them?
● Create cgroup
# mkdir /cgroup/GRP● Prepare minimum limits
# echo 0-2 > /cgroup/GRP/cpuset.cpus
# echo 0-1 > /cgroup/GRP/cpuset.mems● Add a process to a cgroup:
# echo PID > /cgroup/GRP/tasks● Verify that a process is in the cgroup
# grep PID /cgroup/GRP/tasks
cpuset
● Physical CPU & Memory limits– cpuset.cpus - list of allowed CPUs– cpuset.mems - list of allowed memory slots– cpuset.cpu_exclusive - 0/1 are the CPUs
exclusive to this group– cpuset.mem_exclusive - 0/1 are the memory
slots exclusive to this group
Documentation/cgroups/cpusets.txt
CPU accounting
● cpu usage combined for all cpus (in nanoseconds)
● cpu usage per-cpu (in nanoseconds)● per cpu and user/system(in USER_HZ)
● Documentation/cgroups/cpuacct.txt
CPU
● CPU scheduler limits CONFIG_CGROUP_SCHED– cpu.shares– cpu.cfs_quota_us: in microseconds– cpu.cfs_period_us: in microseconds (default 100ms)– cpu.stat: exports throttling statistics
nr_throttled: Number of times the group has been throttled/limited.
throttled_time: The total time duration (in nanoseconds) for which entities of the group have been throttled.● Documentation/scheduler/sched-bwc.txt
CPU 3
CPU 2
CPU 0
CPU examples
CPU 1q - quatap - period
q: 500p: 500
q: 1000p: 500
q: 1500p: 500
q: 2000p: 500
# echo 250000 > cpu.cfs_quota_us# echo 500000 > cpu.cfs_period_us
q: 250p: 500
memory
Only Memory● memory.usage_in_bytes
– show current res_counter usage for memory
● memory.limit_in_bytes– set/show limit of memory usage
● memory.failcnt– show the number of memory usage hits limits
Memory + Swap● memory.memsw.usage_in_bytes● memory.memsw.limit_in_bytes● memory.memsw.failcnt
memory
Kernel Memory limits● memory.kmem.limit_in_bytes
– set/show hard limit for kernel memory
● memory.kmem.usage_in_bytes– show current kernel memory allocation
● memory.kmem.failcnt– show the number of kernel memory usage hits
limits
blkio
/ / 1024 1024 |- lxc/ |- lxc/ 900900| |- c120| |- c120 450450| |- c121| |- c121 450450| |- c122| |- c122 450450| |- c123| |- c123 450450
So each container can get only 50% of the total So each container can get only 50% of the total I/O of the LXC cgroupI/O of the LXC cgroup
Network
● Adding network class to each cgroup so you can later limit it with tc– Documentation/cgroups/net_cls.txt
● Prioritizing network traffic on interface– Documentation/cgroups/net_prio.txt
Freezer + CRIU
● freezer.state – ТHAWED– FREEZING– FROZEN
● freezer.self_freezing– 0 (thawed)/ 1 (frozen)
● freezer.parent_freezing – 0 if partent is frozen
● CRIU - Checkpoint and Restore
In Userspace
What namespaces do we have?
● UTS namespace● User namespace● PID namespace● IPC namespace● Mount namespace● Network namespace
User namespace
User authentication and mapping files:● /etc/passwd● /etc/group● /etc/shadow
- What if we want to create a username called pesho, but such user already exists?
- What if we want to create user joan with UID 1005, but there is already user pesho with UID 1005?
IPC namespace
Unix/Linux IPCs
- unix domain sockets
- shared memory
- semaphores
- message queues
/proc/PID/fd/
|- 3 -> socket:[3537]
IPC namespace
Unix/Linux IPCs
- unix domain sockets
- shared memory
- semaphores
- message queues
key shmid owner perms bytes nattch
0x0052e2c1 1139834880 postgres 600 37879808 4
Network namespace
- IP
- IPv6
- Routing
- TCP
- UDP
- SCTP
- DCCP
- RDS
● Having а separate loopback device for a process● Or simply test the MySQL server on the same IP● Completely different routing for a process
Mount namespace
the most complex one...
having only one / is a problem...
- at around 22000 mounts everything on your machine starts to lag... no matter how many cores or ram you have :(
- having a different /proc/mounts per process would be nice and very interesting to implement... :)
PID namespace
Migration of processes between machines (CRIU)
It allows you to have a two or more processes running with the same PID.
PID - is the PID on the host machine
NSPID - is the PID that the process sees
PID NSPID
1421 5420 ssh-agent
1730 5420 xchat
1756 5420 firefox
Avatar Design
Avatar MasterAvatar Master
Host ServersHost Servers Backup ServerBackup Server
Schedule backup jobs
Avatar Design
Avatar MasterAvatar Master
Host ServerHost Server Backup ServerBackup Server
Start backups
Each backup server has a limit of maximum simultaneous jobs.
- max jobs- max backups- max restores
Avatar Design
Avatar MasterAvatar Master
Host ServerHost Server Backup ServerBackup Server
Report status
each backup reports a lot of things:- thinpool data usage- mounted df output- LV df output- archive_size- broken dbs- remote_addr- user IP- exit_code- caller_pid- interface_type- archive_size- last_progress
Layerd backupsFile
Physical Volume
Volume Group
ThinPool
Logical Volume
Snapshot6
Snapshot5
Snapshot4
Snapshot3
Snapshot2
Snapshot1
Snapshot0
Loop mount
Backup Server Structure
/sdb/avatar on /var/backups type none (rw,bind)
# ls /var/backups/siteground200.com/
total 33333656
-rw------- 1 root root 32212254720 Jul 22 04:03 camerafi
-rw------- 1 root root 32212254720 Jul 22 01:36 celticc1
-rw------- 1 root root 32212254720 Jul 22 00:57 citecang
-rw------- 1 root root 32212254720 Jul 21 20:24 ecoshea5
[root@smallvault1 /]#
Backup Server Structure
# losetup -f /var/backups/siteground200.com/exaera30
# losetup -a
/dev/loop0: [0811]:909901835 (/var/backups/siteground200.com/exaera30)
# vgchange -K -ay
2 logical volume(s) in volume group "exaera30" now active
# lvs
LV VG Attr LSize Pool Origin Data% Meta%
1437516546 exaera30 Vwi-a-t--- 30.00g coregroup 2.09
coregroup exaera30 twi-a-t--- 29.82g 2.10 1.54
#
Backup Server Structure
[root@smallvault1 /]# mount /dev/exaera30/1437516546 /mnt/...
[root@smallvault1 /]# ls -l /mnt/exaera30/1437516546
total 40
drwxr-xr-x5 root root 4096 Jul 21 17:09 configs
drwxr-xr-x3 963 959 4096 Dec 23 2014 etc
drwx--x--x14963 959 4096 Dec 23 2014 home
drwx------ 2 root root 16384 Jul 21 17:09 lost+found
drwxr-x--- 9 963 959 4096 Feb 29 2012 mail
drwxr-xr-x2 root root 4096 Jul 21 17:09 mysql
drwxr-xr-x2 root root 4096 Jul 21 17:09 pgsql
[root@smallvault1 /]#
Account Backup/Restore
● Configuration– Extractor scripts– Intractor scripts
● Files● Mails● SQLs
– MySQL, mysqldump– PgSQL, pg_dump
Full server restore
Avatar MasterAvatar Master
Host ServerHost Server Backup ServerBackup Server
Report status
account 1
ns1 & ns2 restore here
account 3