Kubernetes Pod and Container Security: Difference between revisions
Line 101: | Line 101: | ||
====<tt>runAsNonRoot</tt>==== | ====<tt>runAsNonRoot</tt>==== | ||
When set to "true", <code>runAsNonRoot</code> will prevent a container whose user was set to root in the container metadata from running in that configuration. Both [[#Elements_Shared_by_the_Pod_Security_Context_and_Container_Security_Context|pod security context]] and [[#Container_Security_Context|container security context]] allow declaring <code>runAsNonRoot</code>. | |||
Although containers are mostly isolated from the host system, running their processes are root is considered bad practice. For example, when a host directory is mounted into the container, if the process running in the container is running as root, it has full access to the mounted directory. As such, it is common to prevent running a container process as root, regardless of what the container metadata configuration contains. This can be achieved by setting <code>runAsNonRoot</code> to "true". When set to "true", <code>runAsNonRoot</code> will prevent a container whose user was set to root in the container metadata from running in that configuration. Both [[#Elements_Shared_by_the_Pod_Security_Context_and_Container_Security_Context|pod security context]] and [[#Container_Security_Context|container security context]] allow declaring <code>runAsNonRoot</code>. | |||
====<tt>fsGroup</tt>==== | ====<tt>fsGroup</tt>==== |
Revision as of 00:59, 2 March 2021
External
- https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
- https://kubernetes.io/docs/concepts/security/pod-security-standards/
- https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#podsecuritycontext-v1-core
- https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#securitycontext-v1-core
Internal
Overview
A container instantiated from its image by a container runtime executes by default with access control settings and privileges defined in the image metadata. For example the user and the group various container processes run under are by default specified with the USER directive in the container image. The processes in the container run by default in unprivileged mode and get by default only a limited set of Linux capabilities. The pod and container security contexts, described below, are a declarative method to modify all these run-time settings and get the containers to run with a different runtime configuration. As the name implies, all configuration elements controlled by security contexts are security sensitive.
Pod Security Context
The pod security context is a pod-wide section of the pod manifest that defines privileges and access control settings for the pod and all containers running in the pod.
The pod security context holds pod-level security attributes and common container settings that apply to all containers in the pod. Some configuration elements, such as those referring to the pod's volumes, make sense at the pod level only. Other configuration elements, such as the UID or the GID containers run with, are shared with the container security contexts, and when specified in the pod security context, apply to all containers in the pod. Those fields can be overridden by the per-container security context. If the same configuration element is set in both the container security context and the pod security context, the value set in the container security context takes precedence.
kind: Pod
[...]
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
runAsNonRoot: true
fsGroup: 2000
[...]
Elements Specific to the Pod Security Context
- fsGroup: integer, not quoted in the YAML manifest.
- fsGroupChangePolicy
- supplementalGroups
- sysctls
- runAsUser: integer, not quoted in the YAML manifest.
- runAsGroup: integer, not quoted in the YAML manifest.
- runAsNonRoot
- seLinuxOptions
Container Security Context
Each container may have its own security context definition:
kind: Pod
[...]
spec:
containers:
- name: some-container
securityContext:
runAsUser: 1000
runAsGroup: 3000
runAsNonRoot: true
fsGroup: 2000
[...]
Elements Specific to the Container Security Context
Pod Security Policy
A pod security policy is a cluster-level API resource that specifies required values or limits for security-sensitive aspects for pod and container configurations, as configured by the pod security context and container security context. If those values are not present in the pod configuration, the pod security policy provides default values. For more details on pod security policies, see:
Privileges and Access Control Settings
The following sections document privileges and access control settings that can be set and modified with pod and container security policies and pod seucirty context.
Discretionary Access Control
The permissions to access files in a container are based on the User ID and Group ID. More about Discretionary Access Control is available here:
runAsUser
Can be used to specify a UID to run with.
kind: Pod
[...]
spec:
securityContext:
runAsUser: 1000
[...]
containers:
- name: some-container
securityContext:
runAsUser: 2000
[...]
If not specified in any context, the container metadata USER directive will be used. If no USER metadata is present, the UID will default to root (0). Both pod security context and container security context allow declaring runAsUser
.
runAsGroup
Both pod security context and container security context allow declaring runAsGroup
. If this field is omitted, the primary group ID of the container will be root(0).
supplementalGroups
supplementalGroups
it is a pod-level setting.
runAsNonRoot
Although containers are mostly isolated from the host system, running their processes are root is considered bad practice. For example, when a host directory is mounted into the container, if the process running in the container is running as root, it has full access to the mounted directory. As such, it is common to prevent running a container process as root, regardless of what the container metadata configuration contains. This can be achieved by setting runAsNonRoot
to "true". When set to "true", runAsNonRoot
will prevent a container whose user was set to root in the container metadata from running in that configuration. Both pod security context and container security context allow declaring runAsNonRoot
.
fsGroup
fsGroup
it is a pod-level setting. The configuration element specify supplementary group IDs to be used on files created within this context. It is a pod-level setting.
fsGroupChangePolicy
fsGroupChangePolicy
it is a pod-level setting.
File System Access Control
readOnlyRootFilesystem
readOnlyRootFilesystem
can only be set at container security context level. If set to "true", the policy will enforce that the containers will run with a read-only root filesystem (i.e. no writable layer).
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: example
spec:
readOnlyRootFilesystem: true
[...]
If the container attempts to write, it'll transition to status "CrashLoopBackOff". The cause is described in the container logs:
[Sat Sep 05 04:07:00.410595 2020] [core:error] [pid 1:tid 140116758865024] (30)Read-only file system: AH00099: could not create /usr/local/apache2/logs/httpd.pid
allowedProcMountTypes
sysctls
forbiddenSysctls
allowedUnsafeSysctls
Privileged Mode
privileged
This settings allows running the container in privileged mode. privileged
can only be set at container security context level.
allowPrivilegeEscalation
allowPrivilegeEscalation
can only be set at container security context level. This setting controls whether a process can gain more privileges than its parent process. The boolean value directly controls whether the no_new_privs
(https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt) flag gets set on the container process. allowPrivilegeEscalation is true always when the container is run as privileged or has CAP_SYS_ADMIN.
defaultAllowPrivilegeEscalation
Linux (Kernel) Capabilities
Also see:
defaultAddCapabilities
requiredDropCapabilities
allowedCapabilities
SELinux
More details:
seLinuxOptions
Both pod security context and container security context allow declaring seLinuxOptions
.
seLinux
Seccomp
Access to Host Namespaces
hostPID
, hostIPC
, hostNetwork
, hostPorts
.
Specification of Accepted Volume Types and File System Access Control
volumes
, allowedHostPaths
, allowedFlexVolumes