Linux cgroups: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(32 intermediate revisions by the same user not shown)
Line 1: Line 1:
=External=
=External=


* https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/ch01
* https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/
* https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/resource_management_guide/chap-introduction_to_control_groups
* https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/resource_management_guide/
* https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
* https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
* https://en.wikipedia.org/wiki/Cgroups
* https://en.wikipedia.org/wiki/Cgroups
Line 11: Line 11:


* [[Docker Concepts#cgroups|Docker Concepts]]
* [[Docker Concepts#cgroups|Docker Concepts]]
* [[Linux Namespaces]]
=TODO=
<font color=red>
* The relationship between cgroups settings and their parents' setting. How does a hierarchy translate into effective values.
</font>


=Overview=
=Overview=


cgroups is a Linux kernel feature that allows allocation of resources (CPU, system memory, network bandwidth, or a combination of these) among user-defined ''groups of processes'' running on the system. cgroups not only track groups of processes, but they also expose metrics about CPU, memory and block I/O usage.  
cgroups is a Linux kernel feature that allows allocation of resources (CPU, system memory, network bandwidth, or a combination of these) among user-defined ''groups of processes'' running on the system. cgroups stands for "Consistency Groups". cgroups not only track groups of processes, but they also expose metrics about CPU, memory and block I/O usage.  


cgroups are exposed through a pseudo-filesystem available at /sys/fs/cgroup (older systems expose it at /cgroup). The sub-directories of the cgroup pseudo-filesystem root correspond to different cgroups hierarchies: [[#cpu|cpu]], [[#freezer|freezer]], [[#blkio|blkio]].
cgroups are exposed through a pseudo-filesystem available at /sys/fs/cgroup (older systems expose it at /cgroup). The sub-directories of the cgroup pseudo-filesystem root correspond to different cgroups hierarchies: [[#cpu|cpu]], [[#freezer|freezer]], [[#blkio|blkio]].
Line 54: Line 61:
==cpu==
==cpu==


Uses the scheduler to provide cgroup tasks access to the CPU.
{{External|https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpu}}
 
Uses the scheduler to provide cgroup tasks access to the CPU. Usually, the access to CPU is scheduled using the [[Linux_Resource_Management#CFS_Scheduler|CFS scheduler]], and the control parameters make that obvious by using "cfs" in their name. The [[Linux_Resource_Management#RT_Scheduler|RT scheduler]] is also available.
 
cgroups can be used to control two things:
 
* the [[#Controlling_Relative_Share_of_CPU|relative share of CPU time]] to be allocated to the tasks in the cgroup.
* [[#Controlling_CPU_Ceiling|ceiling enforcement]]: a hard limit of the amount of CPU a cgroup can utilize, to prevent the "noisy neighbor" problem, given the fact that arbitrary processes can use all available CPU if they are allowed to.
 
===Controlling Relative Share of CPU===
 
====cpu.shares====
 
The relative share of CPU to be allocated to the tasks in a cgroup can be controlled with an integer value specified in the "cpu.shares" file of the cgroup. The integer value specifies the share of CPU time available to the tasks in the cgroup, relative to all other tasks being scheduled at the same time. For example, tasks in two cgroups that have cpu.shares set to 100 will receive equal CPU time, but tasks in a cgroup that has cpu.shares set to 200 receive twice the CPU time of tasks in a cgroup where cpu.shares is set to 100. However, the exact value of that time depends on who else is consuming CPU at that time. The value specified in the cpu.shares file must be 2 or higher.
 
In case of a multi-core system, shares specified in "cpu.shares" are distributed across all CPU cores of the system.
 
As a consequence of how CFS works, is difficult to predict how much actually CPU an arbitrary task will get, because that depends on the actual process population on a node, and what they actually do.
 
===<span id='Controlling_CPU_Ceiling'></span>Controlling CPU Throttling===
 
{{External|https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt}}
 
cgroups allow to put a limit on the amount of CPU cycles allocated to a cgroup, to help preventing the "noisy neighbor" problem - processes that consume as much CPU as they can get, reducing everyone else's share.
 
====cpu.cfs_period_us====
 
Specifies a period of time in microseconds (µs, represented here as "us") for how regularly a cgroup's access to CPU resources should be reallocated. If tasks in a cgroup should be able to access a single CPU for 0.2 seconds out of every 1 second, set [[#cpu.cfs_quota_us|cpu.cfs_quota_us]] to 200,000 and [[#cpu.cfs_period_us|cpu.cfs_period_us]] to 1,000,000. The upper limit of the cpu.cfs_quota_us parameter is 1 second and the lower limit is 1,000 microseconds.
 
====cpu.cfs_quota_us====
 
Specifies the total amount of time in microseconds (µs, represented here as "us") for which all tasks in a cgroup can run during one period (as defined by [[#cpu.cfs_period_us|cpu.cfs_period_us]]). As soon as tasks in a cgroup use up all the time specified by the quota, they are throttled for the remainder of the time specified by the period and not allowed to run until the next period. If tasks in a cgroup should be able to access a single CPU for 0.2 seconds out of every 1 second, set cpu.cfs_quota_us to 200,000 and cpu.cfs_period_us to 1,000,000. Note that the quota and period parameters operate on a CPU basis. To allow a process to fully utilize two CPUs, for example, set cpu.cfs_quota_us to 200,000 and cpu.cfs_period_us to 100,000.
 
Setting the value in cpu.cfs_quota_us to -1 indicates that the cgroup does not adhere to any CPU time restrictions. This is also the default value for every cgroup (except the root cgroup).
 
====cpu.stat====
 
* '''nr_periods''': number of period intervals (as specified in [[#cpu.cfs_period_us|cpu.cfs_period_us]]) that have elapsed.
* '''nr_throttled''': number of times tasks in a cgroup have been throttled (that is, not allowed to run because they have exhausted all of the available time as specified by their quota).
* '''throttled_time''': the total time duration (in nanoseconds) for which tasks in a cgroup have been throttled.


==cpuacct==
==cpuacct==


{{External|https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpuacct}}
{{External|https://docs.docker.com/config/containers/runmetrics/#cpu-metrics-cpuacctstat}}
{{External|https://docs.docker.com/config/containers/runmetrics/#cpu-metrics-cpuacctstat}}


Generates automatic reports on CPU resources. Statistics are maintained in "cpuacct.stat", which contains the CPU usage ''accumulated'' by the processes of the group, broken down into [[#user|user]] and [[#system|system]] time. The times are expressed in [[Linux_7_General_Concepts#USER_HZ|USER_HZ]].
Generates automatic reports on CPU resources.  
 
===cpuacct.stat===
 
"cpuacct.stat", which contains the CPU usage ''accumulated'' by the processes of the group, broken down into [[#user|user]] and [[#system|system]] time. The times are expressed in [[Linux_General_Concepts#USER_HZ|USER_HZ]].


===user===
====user====


"user"  time is the amount of time a process has direct control of the CPU, executing process code. Also see [[/proc/stat#cpu|/proc/stat cpu]].
"user"  time is the amount of time a process has direct control of the CPU, executing process code. Also see [[/proc/stat#cpu|/proc/stat cpu]].


===system===
====system====


"system" time is the time the kernel is executing system calls on behalf of the process. Also see [[/proc/stat#cpu|/proc/stat cpu]].
"system" time is the time the kernel is executing system calls on behalf of the process. Also see [[/proc/stat#cpu|/proc/stat cpu]].
===cpuacct.usage===
Contains the total nanoseconds (10<sup>-9</sup> seconds) CPU capacity on the host has been used since boot.
===cpuacct.usage_percpu===
Contains the total nanoseconds (10<sup>-9</sup> seconds) since boot each CPU has been in use. Per-CPU usage can help you identify core imbalances, which can be caused by bad configuration.


==cpuset==
==cpuset==
Line 87: Line 146:


More details: {{External|https://docs.docker.com/config/containers/runmetrics/#metrics-from-cgroups-memory-cpu-block-io}}
More details: {{External|https://docs.docker.com/config/containers/runmetrics/#metrics-from-cgroups-memory-cpu-block-io}}
===Memory Limit===
The value for memory limit is available in:
/sys/fs/cgroup/memory/memory.limit_in_bytes


==net_cls==
==net_cls==
Line 100: Line 165:
==perf_event==
==perf_event==


=Docker and cgroups=
{{External|https://docs.docker.com/config/containers/runmetrics/#metrics-from-cgroups-memory-cpu-block-io}}
=Operations=
=Operations=



Latest revision as of 02:31, 2 January 2021

External

Internal

TODO

  • The relationship between cgroups settings and their parents' setting. How does a hierarchy translate into effective values.

Overview

cgroups is a Linux kernel feature that allows allocation of resources (CPU, system memory, network bandwidth, or a combination of these) among user-defined groups of processes running on the system. cgroups stands for "Consistency Groups". cgroups not only track groups of processes, but they also expose metrics about CPU, memory and block I/O usage.

cgroups are exposed through a pseudo-filesystem available at /sys/fs/cgroup (older systems expose it at /cgroup). The sub-directories of the cgroup pseudo-filesystem root correspond to different cgroups hierarchies: cpu, freezer, blkio.

This command returns a list of the cgroups that are mounted:

cat /proc/mounts | grep cgroup

The control groups subsystems known to the system are available in /proc/cgroups:

#subsys_name	hierarchy	num_cgroups	enabled
cpuset	6	13	1
cpu	4	89	1
cpuacct	4	89	1
memory	8	89	1
devices	3	83	1
freezer	10	13	1
net_cls	5	13	1
blkio	11	89	1
perf_event	2	13	1
hugetlb	9	13	1
pids	7	13	1
net_prio	5	13	1

cgroups are organized hierarchically, child cgroups inheriting certain attributes from their parent group. Many different hierarchies of cgroups can exist simultaneously on a system. Each hierarchy is attached to one or more subsystem, where a subsystem represents a single resource like CPU time or memory.

To figure out what cgroups a process belongs to, look at /proc/<pid>/cgroup: the cgroup is shown as a path relative to the root of the hierarchy mount point. "/" means the process has not been assigned to a group, while "/lxc/something" means the process is member of a container named "something".

cgroups can be configured via the cgconfig service.

cgroups Subsystems

These subsystems are also known as "controllers":

blkio

Sets limits on input/output access from and to block devices.

cpu

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpu

Uses the scheduler to provide cgroup tasks access to the CPU. Usually, the access to CPU is scheduled using the CFS scheduler, and the control parameters make that obvious by using "cfs" in their name. The RT scheduler is also available.

cgroups can be used to control two things:

  • the relative share of CPU time to be allocated to the tasks in the cgroup.
  • ceiling enforcement: a hard limit of the amount of CPU a cgroup can utilize, to prevent the "noisy neighbor" problem, given the fact that arbitrary processes can use all available CPU if they are allowed to.

Controlling Relative Share of CPU

cpu.shares

The relative share of CPU to be allocated to the tasks in a cgroup can be controlled with an integer value specified in the "cpu.shares" file of the cgroup. The integer value specifies the share of CPU time available to the tasks in the cgroup, relative to all other tasks being scheduled at the same time. For example, tasks in two cgroups that have cpu.shares set to 100 will receive equal CPU time, but tasks in a cgroup that has cpu.shares set to 200 receive twice the CPU time of tasks in a cgroup where cpu.shares is set to 100. However, the exact value of that time depends on who else is consuming CPU at that time. The value specified in the cpu.shares file must be 2 or higher.

In case of a multi-core system, shares specified in "cpu.shares" are distributed across all CPU cores of the system.

As a consequence of how CFS works, is difficult to predict how much actually CPU an arbitrary task will get, because that depends on the actual process population on a node, and what they actually do.

Controlling CPU Throttling

https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt

cgroups allow to put a limit on the amount of CPU cycles allocated to a cgroup, to help preventing the "noisy neighbor" problem - processes that consume as much CPU as they can get, reducing everyone else's share.

cpu.cfs_period_us

Specifies a period of time in microseconds (µs, represented here as "us") for how regularly a cgroup's access to CPU resources should be reallocated. If tasks in a cgroup should be able to access a single CPU for 0.2 seconds out of every 1 second, set cpu.cfs_quota_us to 200,000 and cpu.cfs_period_us to 1,000,000. The upper limit of the cpu.cfs_quota_us parameter is 1 second and the lower limit is 1,000 microseconds.

cpu.cfs_quota_us

Specifies the total amount of time in microseconds (µs, represented here as "us") for which all tasks in a cgroup can run during one period (as defined by cpu.cfs_period_us). As soon as tasks in a cgroup use up all the time specified by the quota, they are throttled for the remainder of the time specified by the period and not allowed to run until the next period. If tasks in a cgroup should be able to access a single CPU for 0.2 seconds out of every 1 second, set cpu.cfs_quota_us to 200,000 and cpu.cfs_period_us to 1,000,000. Note that the quota and period parameters operate on a CPU basis. To allow a process to fully utilize two CPUs, for example, set cpu.cfs_quota_us to 200,000 and cpu.cfs_period_us to 100,000.

Setting the value in cpu.cfs_quota_us to -1 indicates that the cgroup does not adhere to any CPU time restrictions. This is also the default value for every cgroup (except the root cgroup).

cpu.stat

  • nr_periods: number of period intervals (as specified in cpu.cfs_period_us) that have elapsed.
  • nr_throttled: number of times tasks in a cgroup have been throttled (that is, not allowed to run because they have exhausted all of the available time as specified by their quota).
  • throttled_time: the total time duration (in nanoseconds) for which tasks in a cgroup have been throttled.

cpuacct

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpuacct
https://docs.docker.com/config/containers/runmetrics/#cpu-metrics-cpuacctstat

Generates automatic reports on CPU resources.

cpuacct.stat

"cpuacct.stat", which contains the CPU usage accumulated by the processes of the group, broken down into user and system time. The times are expressed in USER_HZ.

user

"user" time is the amount of time a process has direct control of the CPU, executing process code. Also see /proc/stat cpu.

system

"system" time is the time the kernel is executing system calls on behalf of the process. Also see /proc/stat cpu.

cpuacct.usage

Contains the total nanoseconds (10-9 seconds) CPU capacity on the host has been used since boot.

cpuacct.usage_percpu

Contains the total nanoseconds (10-9 seconds) since boot each CPU has been in use. Per-CPU usage can help you identify core imbalances, which can be caused by bad configuration.

cpuset

Assigns individual CPUs and memory nodes to tasks in a cgroup.

devices

freezer

memory

Memory metrics are found in the "memory" cgroup. To enable memory control group, add the following kernel command-line parameters:

cgroup_enable=memory swappacount=1

The metrics are available in "memory.stat".

More details:

https://docs.docker.com/config/containers/runmetrics/#metrics-from-cgroups-memory-cpu-block-io

Memory Limit

The value for memory limit is available in:

/sys/fs/cgroup/memory/memory.limit_in_bytes

net_cls

Tags network packets with a tag identifier (classid) that allow the Linux traffic controller (tc) to identify packets.

net_prio

ns

The namespace subsystem.

perf_event

Docker and cgroups

https://docs.docker.com/config/containers/runmetrics/#metrics-from-cgroups-memory-cpu-block-io

Operations

The recommended location for cgroup hierarchies:

/sys/fs/cgroup