Revision as of 21:22, 2 March 2021

External

Internal

Overview

A container instantiated from its image by a container runtime executes by default with access control settings and privileges defined in the image metadata. For example the user and the group various container processes run under are by default specified with the USER directive in the container image. The processes in the container run by default in unprivileged mode and get by default only a limited set of Linux capabilities. The pod and container security contexts, described below, are a declarative method to modify all these run-time settings and get the containers to run with a different runtime configuration. As the name implies, all configuration elements controlled by security contexts are security sensitive.

Pod Security Context

The pod security context is a pod-wide section of the pod manifest that defines privileges and access control settings for the pod and all containers running in the pod.

.spec.securityContext

The pod security context holds pod-level security attributes and common container settings that apply to all containers in the pod. Some configuration elements, such as those referring to the pod's volumes, make sense at the pod level only. Other configuration elements, such as the UID or the GID containers run with, are shared with the container security contexts, and when specified in the pod security context, apply to all containers in the pod. Those fields can be overridden by the per-container security context. If the same configuration element is set in both the container security context and the pod security context, the value set in the container security context takes precedence.

kind: Pod
[...]
spec:  
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    runAsNonRoot: true
    fsGroup: 2000
    [...]

Elements Specific to the Pod Security Context

fsGroup: integer, not quoted in the YAML manifest.
fsGroupChangePolicy
supplementalGroups
sysctls

Elements Shared by the Pod Security Context and Container Security Context

runAsUser: integer, not quoted in the YAML manifest.
runAsGroup: integer, not quoted in the YAML manifest.
runAsNonRoot
seLinuxOptions

Container Security Context

Each container may have its own security context definition:

.spec.containers[].securityContext

kind: Pod
[...]
spec:  
  containers:
    - name: some-container
      securityContext:
        runAsUser: 1000
        runAsGroup: 3000
        runAsNonRoot: true
        fsGroup: 2000
        [...]

Elements Specific to the Container Security Context

privileged
allowPrivilegeEscalation
readOnlyRootFilesystem
capabilities
procMount
seccompProfile

Pod Security Policy

A pod security policy is a cluster-level API resource that specifies required values or limits for security-sensitive aspects for pod and container configurations, as configured by the pod security context and container security context. If those values are not present in the pod configuration, the pod security policy provides default values. For more details on pod security policies, see:

Pod Security Policy Concepts

Privileges and Access Control Settings

The following sections document privileges and access control settings that can be set and modified with pod and container security policies and pod seucirty context.

Discretionary Access Control

https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups

The permissions to access files in a container are based on the User ID and Group ID. More about Discretionary Access Control is available here:

Linux Security Concepts | Discretionary Access Control

`runAsUser`

Can be used to specify a UID to run with.

kind: Pod
[...]
spec:  
  securityContext:
    runAsUser: 1000
    [...]
  containers:
    - name: some-container
      securityContext:
        runAsUser: 2000
      [...]

If not specified in any context, the container metadata USER directive will be used. If no USER metadata is present, the UID will default to root (0). Both pod security context and container security context allow declaring runAsUser.

For more details on how the runAsUser setting influences mount point permissions, see:

Mounting Volumes in Pods | Permissions

`runAsGroup`

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#podsecuritycontext-v1-core

Provides the GID to run the entrypoint of the container process. The GID will also be reported as part of the user's groups.

kind: Pod
[...]
spec:  
  securityContext:
    runAsUser: 1000
    runAsGroup: 2000
    [...]
  containers:
    - name: some-container
      securityContext:
        runAsUser: 3000
        runAsGroup: 4000
      [...]

If not set, the container image value is used, and if that is not set, the primary group ID of the container will be root(0). Both pod security context and container security context allow declaring runAsGroup.

runAsGroup cannot be specified without being accompanied by runAsUser. If only runAsGroup is used, the pod will not start with an "runAsGroup is specified without a runAsUser" error message.

For more details on how the runAsGroup setting influences mount point permissions, see:

Mounting Volumes in Pods | Permissions

`runAsNonRoot`

Although containers are mostly isolated from the host system, running their processes are root is considered bad practice. For example, when a host directory is mounted into the container, if the process running in the container is running as root, it has full access to the mounted directory. As such, it is common to prevent running a container process as root, regardless of what the container metadata configuration contains. This can be achieved by setting runAsNonRoot to "true". When set to "true", runAsNonRoot will prevent a container whose user was set to root in the container metadata from running in that configuration. Both pod security context and container security context allow declaring runAsNonRoot.

kind: Pod
[...]
spec:  
  securityContext:
    runAsNonRoot: true
    [...]
  containers:
    - name: some-container
      securityContext:
        runAsNonRoot: true
      [...]

If runAsNonRoot is set to true and the container attempts to run as root, the pod will end up with a "CreateContainerConfigError" status and an error message along the lines of:

"Error: container has runAsNonRoot and image will run as root".

`supplementalGroups`

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#podsecuritycontext-v1-core

supplementalGroups it is a pod-level setting that contains a list of groups applied to the first process run in each container, in addition to the container's primary GID. If unspecified, no groups will be added to any container. Also see:

Linux Security Concepts #124; Supplementary Group List

File System Access Control

`readOnlyRootFilesystem`

readOnlyRootFilesystem allows configuration that prevents processes from writing the container's root filesystem. If set to "true", the policy will enforce that the containers will run with a read-only root filesystem (i.e. no writable layer). Mounted volumes can be written. This is a common security practice. readOnlyRootFilesystem can only be set at container security context level.

kind: Pod
[...]
spec:  
  containers:
    - name: some-container
      securityContext:
        readOnlyRootFileSystem: true
      [...]

This configuration can be enforced at pod security policy level:

kind: PodSecurityPolicy
spec:
  readOnlyRootFilesystem: true
  [...]

If the container attempts to write, it'll transition to status "CrashLoopBackOff". The cause is described in the container logs:

[Sat Sep 05 04:07:00.410595 2020] [core:error] [pid 1:tid 140116758865024] (30)Read-only file system: AH00099: could not create /usr/local/apache2/logs/httpd.pid

`fsGroup`

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#podsecuritycontext-v1-core

https://kubernetes.io/docs/concepts/policy/pod-security-policy/#volumes-and-file-systems

fsGroup is a pod-level setting that specifies a special supplemental group ID applying to all containers in the pod.

kind: Pod
[...]
spec:  
  securityContext:
    fsGroup: 3333
    [...]

Some volume types allow the Kubelet to change the ownership of that volume, as projected in the pod, to be owned by the pod:

The owning GID will be the fsGroup
The setgid bit is set. New files created in the volume will be owned by fsGroup.
The permission bits are OR'd with rw-rw----

If not set, the Kubelet will not modify the ownership and permissions of any volume.

"id" ran from a container that belongs to a pod configured as such return the fsGroup among its "groups":

# id
uid=1111 gid=2222 groups=2222,3333

For more details on how the fsGroup setting influences mount point permissions, see:

Mounting Volumes in Pods | Permissions

Volume Types that Support fsGroup

emptyDir
Some volumes exposed via CSI. See https://kubernetes-csi.github.io/docs/support-fsgroup.html

Volume Types that Do Not Support fsGroup

For the following volumes, setting fsGroup does not have any effect:

Docker Desktop Kubernetes hostPath: it will create the files with runAsGroup or root if runAsGroup not set.
EKS with EFS exposed as PVs

`fsGroupChangePolicy`

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#podsecuritycontext-v1-core

https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods

https://kubernetes.io/blog/2020/12/14/kubernetes-release-1.20-fsgroupchangepolicy-fsgrouppolicy/

fsGroupChangePolicy it is a pod-level setting that defines behavior of changing ownership and permission of the volume before being exposed inside pod. This field will only apply to volume types which support fsGroup based ownership (and permissions). It will have no effect on ephemeral volume types such as: secret, configmaps and emptydir. Valid values are "OnRootMismatch" and "Always". If not specified defaults to "Always".

`allowedProcMountTypes`

sysctls

`forbiddenSysctls`

`allowedUnsafeSysctls`

Privileged Mode

https://kubernetes.io/docs/concepts/policy/pod-security-policy/#privileged

`privileged`

This setting allows running the container in privileged mode, meaning that the container gets full access to the node's kernel. privileged can only be set at container security context level.

kind: Pod
[...]
spec:  
  containers:
    - name: some-container
      securityContext:
        privileged: true
      [...]

More details on privileged mode:

Linux Security Concepts | Privileged Mode

`allowPrivilegeEscalation`

https://kubernetes.io/docs/concepts/policy/pod-security-policy/#privilege-escalation

allowPrivilegeEscalation can only be set at container security context level. This setting controls whether a process can gain more privileges than its parent process. The boolean value directly controls whether the no_new_privs (https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt) flag gets set on the container process. allowPrivilegeEscalation is true always when the container is run as privileged or has CAP_SYS_ADMIN.

`defaultAllowPrivilegeEscalation`

Linux (Kernel) Capabilities

https://kubernetes.io/docs/concepts/policy/pod-security-policy/#capabilities

https://linux-audit.com/linux-capabilities-hardening-linux-binaries-by-removing-setuid/

https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container

Linux capabilities are a fine-grained mechanism that allows giving a container access only to the kernel features it requires instead of giving it unlimited permissions by making in a privileged container. Also see:

Linux Capabilities

`capabilities`

This setting allows adding or dropping capabilities on a per-container basis. capabilities can only be set at container security context level.

kind: Pod
[...]
spec:  
  containers:
    - name: some-container
      capabilities:
        add:
          - SYS_TIME
        drop:
          - CHOWN
      [...]

Linux kernel capabilities are usually prefixed with CAP_ (e.g. CAP_SYS_TIME). However, when specifying them in a pod specification, you must leave out the prefix: SYS_TIME.

`defaultAddCapabilities`

`requiredDropCapabilities`

`allowedCapabilities`

SELinux

https://kubernetes.io/docs/concepts/policy/pod-security-policy/#selinux

More details:

SELinux

`seLinuxOptions`

https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#assign-selinux-labels-to-a-container

Both pod security context and container security context allow declaring seLinuxOptions.

`seLinux`

Seccomp

https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-seccomp-profile-for-a-container

Access to Host Namespaces

https://kubernetes.io/docs/concepts/policy/pod-security-policy/#host-namespaces

hostPID, hostIPC, hostNetwork, hostPorts.

Specification of Accepted Volume Types and File System Access Control

https://kubernetes.io/docs/concepts/policy/pod-security-policy/#volumes-and-file-systems

volumes, allowedHostPaths, allowedFlexVolumes

@@ Line 139: / Line 139: @@
 ====<tt>supplementalGroups</tt>====
 {{External|https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#podsecuritycontext-v1-core}}
-<code>supplementalGroups</code> it is a [[#Elements_Specific_to_the_Pod_Security_Context|pod-level setting]] that contains a list of groups applied to the first process run in each container, in addition to the container's primary GID. If unspecified, no groups will be added to any container.
+<code>supplementalGroups</code> it is a [[#Elements_Specific_to_the_Pod_Security_Context|pod-level setting]] that contains a list of groups applied to the first process run in each container, in addition to the container's primary GID. If unspecified, no groups will be added to any container. Also see: {{Internal|Linux_Security_Concepts#Supplementary_Group_List|Linux Security Concepts #124; Supplementary Group List}}
 ==File System Access Control==

Kubernetes Pod and Container Security: Difference between revisions

Revision as of 21:22, 2 March 2021

External

Internal

Overview

Pod Security Context

Elements Specific to the Pod Security Context

Elements Shared by the Pod Security Context and Container Security Context

Container Security Context

Elements Specific to the Container Security Context

Pod Security Policy

Privileges and Access Control Settings

Discretionary Access Control

runAsUser

runAsGroup

runAsNonRoot

supplementalGroups

File System Access Control

readOnlyRootFilesystem

fsGroup

Volume Types that Support fsGroup

Volume Types that Do Not Support fsGroup

fsGroupChangePolicy

allowedProcMountTypes

sysctls

forbiddenSysctls

allowedUnsafeSysctls

Privileged Mode

privileged

allowPrivilegeEscalation

defaultAllowPrivilegeEscalation

Linux (Kernel) Capabilities

capabilities

defaultAddCapabilities

requiredDropCapabilities

allowedCapabilities

SELinux

seLinuxOptions

seLinux

Seccomp

Access to Host Namespaces

Specification of Accepted Volume Types and File System Access Control

Navigation menu

Search

`runAsUser`

`runAsGroup`

`runAsNonRoot`

`supplementalGroups`

`readOnlyRootFilesystem`

`fsGroup`

`fsGroupChangePolicy`

`allowedProcMountTypes`

`forbiddenSysctls`

`allowedUnsafeSysctls`

`privileged`

`allowPrivilegeEscalation`

`defaultAllowPrivilegeEscalation`

`capabilities`

`defaultAddCapabilities`

`requiredDropCapabilities`

`allowedCapabilities`

`seLinuxOptions`

`seLinux`