lsa2 - 02 control groups
TRANSCRIPT
Control Groups
What do we have?
cpuset- whole cores and cpu mapping
cpuacct- cpu cycle accounting
cpu- less then core granularity
memory- limits and accounting
blkio- limits and accounting
net_cls- network classification
net_prio- network priority
Freezer + checkpoint/restore - migration
General structure
tasksattach a task(thread) and show list of threads
cgroup.procsshow list of processes
cgroup.event_controlan interface for event_fd()
# mount -t cgroup none /cgroups# mount -t cgroup -o cpuset cpuset /cg/cpuset
cpuset
Physical CPU & Memory limitscpuset.cpus - a list of allowed CPUs
cpuset.mems - a list of allowed memory slots
cpuset.cpu_exclusive - 0/1 are the CPUs exclusive to this group(no other group can use them)
cpuset.mem_exclusive or cpuset.mem_hardwall - 0/1 are the memory slots exclusive to this group(no other group can use them)
cpuset.sched_load_balance - should the kernel balance the tasks between the CPUs in the current cpuset
cpuset.sched_relax_domain_level
Documentation/cgroups/cpusets.txt
cpuset
Physical CPU & Memory limitscpuset.sched_relax_domain_level
-1 : no request. use system default or follow request of others. 0 : no search. 1 : search siblings (hyperthreads in a core). 2 : search cores in a package. 3 : search cpus in a node [= system wide on non-NUMA system]on NUMA systems only 4 : search nodes in a chunk of node 5 : search system wide
Documentation/cgroups/cpusets.txt
CPU accounting
cpu usage combined for all cpus (in nanoseconds)
cpu usage per-cpu (in nanoseconds)
per cpu and user/system(in USER_HZ)
Documentation/cgroups/cpuacct.txt
CPU
CPU scheduler limits CONFIG_CGROUP_SCHEDcpu.shares: the amount of cpu shares available to the group
cpu.cfs_quota_us: the total available run-time within a period (in microseconds) (-1 no limit)
cpu.cfs_period_us: the length of a period (in microseconds) (default 100ms)
cpu.stat: exports throttling statistics
nr_periods: Number of enforcement intervals that have elapsed.nr_throttled: Number of times the group has been throttled/limited.throttled_time: The total time duration (in nanoseconds) for which entities of the group have been throttled.
Documentation/scheduler/sched-bwc.txt
CPU examples
1. Limit a group to 1 CPU worth of runtime. If period is 250ms and quota is also 250ms, the group will get 1 CPU worth of runtime every 250ms. # echo 250000 > cpu.cfs_quota_us /* quota = 250ms */ # echo 250000 > cpu.cfs_period_us /* period = 250ms */2. Limit a group to 2 CPUs worth of runtime on a multi-CPU machine. With 500ms period and 1000ms quota, the group can get 2 CPUs worth of runtime every 500ms. # echo 1000000 > cpu.cfs_quota_us /* quota = 1000ms */ # echo 500000 > cpu.cfs_period_us /* period = 500ms */The larger period here allows for increased burst capacity.3. Limit a group to 20% of 1 CPU. With 50ms period, 10ms quota will be equivalent to 20% of 1 CPU. # echo 10000 > cpu.cfs_quota_us /* quota = 10ms */ # echo 50000 > cpu.cfs_period_us /* period = 50ms */By using a small period here we are ensuring a consistent latency response at the expense of burst capacity.
memory
Only Memorymemory.usage_in_bytes - show current res_counter usage for memory
memory.limit_in_bytes - set/show limit of memory usage
memory.failcnt - show the number of memory usage hits limits
memory.max_usage_in_bytes - show max memory usage recordedMemory + Swap
memory.memsw.usage_in_bytes- show current res_counter usage
memory.memsw.limit_in_bytes - set/show limit
memory.memsw.failcnt - show the number of hits limits
memory.memsw.max_usage_in_bytes - show max memory+Swap usage recorded
memory.soft_limit_in_bytes - set/show soft limit of memory usage
memory.stat - show various statistics
memory.use_hierarchy - set/show hierarchical account enabled
memory.force_empty - trigger forced move charge to parent
memory.pressure_level - set memory pressure notifications
memory.swappiness - set/show swappiness parameter of vmscan
memory
memory.move_charge_at_immigrate- set/show controls of moving charges
memory.oom_control - set/show oom controls.
memory.numa_stat - show the number of memory usage per numa node
Kernel Memory limits
memory.kmem.limit_in_bytes - set/show hard limit for kernel memory
memory.kmem.usage_in_bytes - show current kernel memory allocation
memory.kmem.failcnt - show the number of kernel memory usage hits limits
memory.kmem.max_usage_in_bytes - show max kernel memory usage recorded
memory.kmem.tcp.limit_in_bytes - set/show hard limit for tcp buf memory
memory.kmem.tcp.usage_in_bytes - show current tcp buf memory allocation
memory.kmem.tcp.failcnt - show the number of tcp buf memory usage hits limits
memory.kmem.tcp.max_usage_in_bytes - show max tcp buf memory usage recorded
blkio statistics
blkio.io_wait_time
blkio.io_merged
blkio.io_queued
blkio.avg_queue_size
blkio.group_wait_time
blkio.throttle.io_serviced
blkio.throttle.io_service_bytes
blkio.sectors
blkio.io_service_bytes
blkio.io_serviced
blkio.io_service_time
blkio.*_recursive
blkio.reset_statswrite an int to it
blkio limiting
blkio.weight - allowed range 10 - 1000
blkio.weight_device - weight per device
blkio.leaf_weight[_device] - when competing with child cgroups
blkio.time - disk time allocated in miliseconds
blkio.throttle.read_bps_device
blkio.throttle.write_bps_device
blkio.throttle.read_iops_device
Network
Adding network class to each cgroup so you can later limit it with tcDocumentation/cgroups/net_cls.txt
Prioritizing network traffic on interfaceDocumentation/cgroups/net_prio.txt
Freezer + CRIU
freezer.state HAWED
FREEZING
FROZEN
freezer.self_freezing0 (thawed)/ 1 (frozen)
freezer.parent_freezing 0 if partent is frozen
CRIU - Checkpoint and Restore In Userspace