linux cgroups and namespaces
DESCRIPTION
Luiz Viana e André Ferraz (Cazé) mostram uma visão geral das tecnologias nativas para isolamento de recursos em ambiente Linux.TRANSCRIPT
Introduction toLinux Control Groups and Namespaces
Andre Ferraz @deferraz
Luiz Viana @luizxx
Delivery Engineering Team
What is it?
• Basically, a kernel feature that allows you to allocate resources among groups of tasks running on a system.
• Provides a way to hierarchically group and label processes, and to apply resource limits to them.
Resource allocation
• CPU time and scheduling
• System memory / swap area
• Network bandwidth and namespaces
• Block devices bandwidth and IOPS
• Device access and isolation
Implications
• Because a task can belong to only a single cgroup, there is only one way that a task can be limited or affected by any single subsystem. This is logical: a feature, not a limitation.
• You can group several subsystems together so that they affect all tasks in a single hierarchy. Because cgroups in that hierarchy have different parameters set, those tasks will be affected differently.
• Conversely, if the need for splitting subsystems among separate hierarchies is reduced, you can remove a hierarchy and attach its subsystems to an existing one.
• The design allows for simple cgroup usage, such as setting a few parameters for specific tasks in a single hierarchy, such as one with just the cpu and memory subsystems attached.
• The design also allows for highly specific configuration: each task (process) on a system could be a member of each hierarchy, each of which has a single attached subsystem. Such a configuration would give the system administrator absolute control over all parameters for every single task.
• If you are limiting resources from a user, he will have more processes waiting for resources and due to this, load average on your server will have higher values constantly.
Using control groups: cgconfig
• The cgconfig service installed with the libcgroup package provides a convenient way to create hierarchies, attach subsystems to hierarchies, and manage cgroups within those hierarchies.
• It is recommended that you use cgconfig to manage hierarchies and cgroups on your system.
• The default /etc/cgconfig.conf file installed with the libcgroup package creates and mounts an individual hierarchy for each subsystem, and attaches the subsystems to these hierarchies. The cgconfig service also allows to create configuration files in the /etc/cgconfig.d/ directory and to invoke them from /etc/cgconfig.conf.
• If you stop the cgconfig service (with the service cgconfig stop command), it unmounts all the hierarchies that it mounted.
Using control groups: cgred
• Cgred (cgrulesengd daemon) is a service that moves tasks into cgroups according to parameters set in the /etc/cgrules.conf file.
• Entries in the /etc/cgrules.conf file can take one of the two forms:
user subsystems control_group
user:command subsystems control_group
• Group names can be specified prefixing the "@" character.
• More than one subsystem can be specified in a comma-separated list
• Commands are identified by the process name or full command path of a process.
Using control groups: reaper
• Reaper allows you to manage groups dynamically on shared multi-user environments.
• Can be extended to work on any environment by creating a function to validate users.
• Entirely written in Python and easy to modify.
• Limit exceptions can be created using the command line interface.
• Does not depend on external agents.
• Use of standard items from libcgroups available in most Linux distributions.
Available on Github, https://github.com/lviana/reaper
Obtaining cgroups information
• Listing controllers
• # lssubsys -m controllers
• # cat /proc/cgroups
• Finding control groups
• # lscgroup
• # lscgroup cpuset:adminusers
• Display parameters
• # cgget -r parameter list_of_cgroups
• # cgget -g cpuset /
Systemd
• System service manager for Linux that provides parallelization capabilities, keeps track of processes using Linux control groups, offers on-demand starting of daemons and implements an elaborated transactional dependency-based service control logic.
• A cgroup is bound to a system unit configurable with a unit file and manageable with systemd's command-line utilities.
• Cgroups in systemd can be transient or persistent.
Transient cgroups
• Using transient cgroups, you can set limits on resources consumed by the service during its runtime.
• Applications can create transient cgroups dynamically by using API calls to systemd.
• Commands are started directly from the systemd-run process and thus inherit the execution environment of the caller.
• Commands are run in scope units in synchronous execution.
Persistent cgroups
• You can assign a persistent cgroup to a systemd service, editting its unit configuration file.
• It can be used to manage services that are started automatically.
• Unit configuration files are available on /usr/lib/systemd/system/directory.
• Temporary changes can be set using systemctl command.
Where the f*ck do I use it?
• Prioritizing database io
• Limit resources available to end users
• Optimizing processor usage
• Control network access
• Isolate process from devices
• Optimize available physical resources
• Set network traffic priority
Projects using it
• Linux Containers / LXC (https://linuxcontainers.org/)
• Docker (http://docker.io)
• Apache Mesos (http://mesos.apache.org)
• Openstack (http://www.openstack.org)
• Locaweb (http://github.com/locaweb)
Namespaces,what is it?
• Lightweight process isolation
• Processes can have different views of the system than other processes
• Old Concept: 1992 on plan9 (http://www.cs.bell-labs.com/sys/doc/names.html)
• No hypervisor
• setns() syscall
Namespaces, types
• mountpoints / fs (MNT) [First created on 2002 by Al Viro]
• processes (PID)
• network (NET)
• System V IPC
• Hostname (UTS)
• User (UIDS)
Namespaces, flags
• CLONE_NEWNS 2.4.19 CAP_SYS_ADMIN
• CLONE_NEWUTS 2.6.19 CAP_SYS_ADMIN
• CLONE_NEWIPC 2.6.19 CAP_SYS_ADMIN
• CLONE_NEWPID 2.6.24 CAP_SYS_ADMIN
• CLONE_NEWNET 2.6.29 CAP_SYS_ADMIN
• CLONE_NEWUSER 3.8 No cap Required
Namespaces, syscalls
clone () - create new process and new namespace
unshare() - create new namespace and attaches current process
setns() - join an existing namespace
Namespaces, network ns example
# ip netns add newnet
# ip link add veth0 type veth peer name veth1
# ip link set veth1 netns newnet
# ip netns exec newnet ip link list
# ip netns exec newnet bash
Namespaces, application server support
• uWSGI got full namespaces support in 1.9/2.0
• Additional isolated filesystems
• You can detach single components to increase isolation
More information on:
http://uwsgi-docs.readthedocs.org/en/latest/Namespaces.html
Reference
• https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
• http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/cgroups/
• https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/
• https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/
• http://uwsgi-docs.readthedocs.org/en/latest/Namespaces.html