Kubernetes Container Probes: Difference between revisions
(53 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=External= | =External= | ||
* https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes (partially synced) | |||
* https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes | * https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes | ||
* https://medium.com/swlh/fantastic-probes-and-how-to-configure-them-fef7e030bd2f | * https://medium.com/swlh/fantastic-probes-and-how-to-configure-them-fef7e030bd2f | ||
Line 15: | Line 14: | ||
=Overview= | =Overview= | ||
A probe is a diagnostic performed periodically by the [[kubelet]] on a container. To perform the diagnostic, the kubelet calls a [[#Handler|handler]], that must be declared and implemented by the container. Each probe has one of these results: | A probe is a diagnostic performed periodically by the [[Kubelet#Container_Probe_Execution|kubelet]] on a container. To perform the diagnostic, the kubelet calls a [[#Handler|handler]], that must be declared and implemented by the container. Each probe has one of these results: | ||
* success - the container passed the diagnostic | * success - the container passed the diagnostic | ||
* failure - the container failed the diagnostic | * failure - the container failed the diagnostic | ||
Line 23: | Line 22: | ||
=<span id='Handler'></span>Handlers= | =<span id='Handler'></span>Handlers= | ||
A handler is a piece of logic declared and implemented by the container, which is used by Kubernetes control mechanism to draw conclusions about the state the container is in | A handler is a piece of logic declared and implemented by the container, which is used by the Kubernetes control mechanism to draw conclusions about the state the container is in | ||
There are three types of handlers, described below. Any of these handlers can be used to perform [[#Container_Startup_Check|startup]], [[#Container_Liveness_Check|liveness]] and [[#Container_Readiness_Check|readiness]] checks. | There are three types of handlers, described below. Any of these handlers can be used to perform [[#Container_Startup_Check|startup]], [[#Container_Liveness_Check|liveness]] and [[#Container_Readiness_Check|readiness]] checks. | ||
==ExecAction== | ==ExecAction== | ||
{{External|https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#execaction-v1-core}} | {{External|https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#execaction-v1-core}} | ||
The exec action | The exec action, declared as <code>exec:</code>, executes a specified command inside the container. The diagnostic is considered successful if the command exits with a status code of 0. | ||
==HTTPGetAction== | ==HTTPGetAction== | ||
{{External|https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#httpgetaction-v1-core}} | {{External|https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#httpgetaction-v1-core}} | ||
Line 43: | Line 43: | ||
==TCPSocketAction== | ==TCPSocketAction== | ||
{{External|https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#tcpsocketaction-v1-core}} | {{External|https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#tcpsocketaction-v1-core}} | ||
Performs a TCP check against the container’s IP address on a specified port. The diagnostic is considered successful if the connection is successfully established. | Performs a TCP check against the container’s IP address on a specified port. The diagnostic is considered successful if the connection is successfully established, which indicates that the port is open. | ||
=<span id='Container_Startup_Check'></span>Container and Pod Startup Check= | |||
The startup check is performed by a '''startup probe'''. Startup probes have been introduced in Kubernetes 1.16. The probe indicates whether the application within the container is started. If a startup probe is not provided, the default result is "success". If a startup probe is provided, all other probes are disabled until the startup probe succeeds. If the startup probe fails, the container is killed and it is subject to its [[Kubernetes_Pod_and_Container_Concepts#Container_Restart_Policy|restart policy]]. | |||
<font color=darkgray>TODO: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-should-you-use-a-startup-probe</font> | |||
=<span id='Container_Liveness_Check'></span>Container and Pod Liveness Check= | =<span id='Container_Liveness_Check'></span>Container and Pod Liveness Check= | ||
The liveness check is performed by a '''liveness probe'''. The probe indicates whether the container is running. If a liveness probe is not provided, the default is "success". If a liveness probe is provided and it fails, the container will be killed and then subjected to its [[Kubernetes_Pod_and_Container_Concepts#Container_Restart_Policy|restart policy]]. | The liveness check is performed on a specific container by a '''liveness probe'''. The probe indicates whether the container is running. If a liveness probe is not provided, the default is "success". If a liveness probe is provided and it fails, the container will be killed and then subjected to its [[Kubernetes_Pod_and_Container_Concepts#Container_Restart_Policy|restart policy]]. Note that the pod may contain other containers. If the process in the container is able to crash on its own whenever it encounters an issue or becomes unhealthy, a liveness probe is not needed - the kubelet will automatically perform the correct action in accordance with the pod [[Kubernetes_Pod_and_Container_Concepts#Container_Restart_Policy|restart policy]]. | ||
<font color=darkkhaki>Clarify the relationship between the health of individual containers and the health of the pod. When does the pod transitions from [[Kubernetes_Pod_and_Container_Concepts#Running|Running]] phase in the [[Kubernetes_Pod_and_Container_Concepts#Failed|Failed]] phase as a function of its containers' health?</font> | |||
=<span id='Container_Readiness_Check'></span>Container and Pod Readiness Check= | =<span id='Container_Readiness_Check'></span>Container and Pod Readiness Check= | ||
By definition, a container is "ready" when it can successfully respond to requests. | |||
By default, containers IP address and port pairs (endpoints) are added to a [[Kubernetes_Service_Concepts#Service_.28ClusterIP_Service.29|service]]'s [[Kubernetes_Service_Concepts#Endpoints|Endpoints]] list and forwarded traffic to if the service selector matches the pod labels '''and''' the pod is "ready", meaning that all pod's containers that expose ports with the service are ready. Usually a pod exposes just one container, so "ready pod" and "ready container" are in this case equivalent. The situation when a pod exposes multiple containers is addressed in detail in the [[#Readiness_and_Multiple_Containers_per_Pod|Readiness and Multiple Containers per Pod]] section. | |||
The notion of being "ready" is something that is specific to each container. For example, in the initialization phase of the pod, its traffic-serving container may need time to load either configuration or data, or it may need to perform a warm-up procedure to prevent the first user request from taking too long and affecting user experience. The readiness probe should be designed in such a way that it start succeeding only after initialization. | The notion of being "ready" is something that is specific to each container. For example, in the initialization phase of the pod, its traffic-serving container may need time to load either configuration or data, or it may need to perform a warm-up procedure to prevent the first user request from taking too long and affecting user experience. The readiness probe should be designed in such a way that it start succeeding only after initialization. Container that serve load in production should always define a readiness probe. Readiness probe should not be used for orderly taking pods out of load. That should be done by either deleting the pod, or defining a label "enabled=true" or similar that can be switched on or off. | ||
==Playground== | |||
{{External|https://github.com/ovidiuf/playground/tree/master/kubernetes/services/readiness-probe}} | |||
==Readiness Probe Operations== | |||
If the container does not provide a readiness probe, the default diagnostic result is "success". | |||
If a probe is declared, the default state of readiness before the initial delay is "failure". | |||
Once the probe executes successfully after the initial delay and the container endpoint is added to the [[Kubernetes_Service_Concepts#Endpoints|Endpoints]] instance, the corresponding readiness probe is invoked periodically and the endpoint stays in the Endpoints active addresses list as long as the probe executes successfully. The initial delay and probe timing arithmetic is explained in the [[#Probe_Template|Probe Template]] section. | |||
If the readiness probe fails, the endpoint is removed from the list. Technically, the pod's IP address is not actually removed from [[Kubernetes_Service_Concepts#Endpoints|Endpoints]], but listed in the "notReadyAddresses" list. The endpoint does not, however, show up in 'kubectl get ep <name>' output, and the backing pod does not receive traffic anymore. | |||
If the | If the probe starts succeeding again, the pod is put back in traffic. | ||
If | If there is more than one container per pod and at least one readiness probe fails, the endpoints for all containers in the pod are removed. | ||
The pod's readiness state is displayed in the output of [[Kubernetes_Pod_Operations#get|kubectl get pod]] command. A pod with one container displays a report similar to: | |||
<syntaxhighlight lang='text'> | |||
NAME READY STATUS RESTARTS AGE | |||
cassandra-0 0/1 Running 0 23s | |||
</syntaxhighlight> | |||
A pod with two containers displays a report similar to: | |||
<syntaxhighlight lang='text'> | <syntaxhighlight lang='text'> | ||
NAME | NAME READY STATUS RESTARTS AGE | ||
httpd 1/2 Running 0 31m | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Unlike a liveness probe, if a container fails the readiness check, it will not be killed or restarted. | Unlike a liveness probe, if a container fails the readiness check, it will not be killed or restarted. | ||
Note that the container may put itself into a [[Kubernetes_Pod_and_Container_Concepts#Unready_State|unready state]] regardless of whether the readiness probe exists. The Pod remains in the unready state while it waits for the containers in the pod to stop. | Note that the container may put itself into a [[Kubernetes_Pod_and_Container_Concepts#Unready_State|unready state]] regardless of whether the readiness probe exists. The Pod remains in the unready state while it waits for the containers in the pod to stop. | ||
</font> | |||
<font color=darkgray>TODO: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-should-you-use-a-readiness-probe</font> | |||
==Readiness and Multiple Containers per Pod== | |||
If a [[Kubernetes_Pod_and_Container_Concepts#Pod|pod]] defines multiple containers, each container may declare its own readiness probe. The pod is considered ready when '''all''' of its containers are ready. If at least one container is not ready, even if all others are ready, the pod will not count as "ready" and it will not be added to, or it will be removed from the service Endpoints. | |||
==publishNotReadyAddress== | |||
There are situations when a pod should be added to [[Kubernetes_Service_Concepts#Endpoints|Endpoints]] even if it is not ready. A typical situation involves a container that runs a JVM that exposes the main service port and also a debug port. Normally, the ports are not exposed until the container's readiness probe passes, and that most likely involves a full boot process, so if debugging the boot process is not possible. The solution is to publish the container's endpoints anyway , as soon as the pod is started, and enable access to debugging, even if the main service is not ready yet. | |||
This is configured with: | |||
<syntaxhighlight lang='yaml'> | |||
apiVersion: v1 | |||
kind: Service | |||
spec: | |||
... | |||
publishNotReadyAddresses: true | |||
</syntaxhighlight> | |||
[[Kubernetes_Service_Manifest#publishNotReadyAddresses|publishNotReadyAddresses]], when set to true, indicates that DNS implementations must publish the notReadyAddresses of subsets for the Endpoints associated with the Service. The default value is false. The primary use case for setting this field is to use a StatefulSet's Headless Service to propagate SRV records for its Pods without respect to their readiness for purpose of peer discovery. | |||
==Manual Readiness Probe Example== | ==Manual Readiness Probe Example== | ||
Line 156: | Line 179: | ||
===successThreshold=== | ===successThreshold=== | ||
Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1. | Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1. | ||
<br> | |||
<center>[[[Kubernetes_Pod_and_Container_Concepts#Container_Probes_and_Pod_Health|Next]]]</center> |
Latest revision as of 18:54, 27 September 2021
External
- https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes (partially synced)
- https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes
- https://medium.com/swlh/fantastic-probes-and-how-to-configure-them-fef7e030bd2f
Internal
TODO
- Merge and deplete OpenShift Container Probes.
Overview
A probe is a diagnostic performed periodically by the kubelet on a container. To perform the diagnostic, the kubelet calls a handler, that must be declared and implemented by the container. Each probe has one of these results:
- success - the container passed the diagnostic
- failure - the container failed the diagnostic
- unknown - the diagnostic itself filed so no action should be taken.
There are three kinds of probes: startup, liveness and readiness.
Handlers
A handler is a piece of logic declared and implemented by the container, which is used by the Kubernetes control mechanism to draw conclusions about the state the container is in There are three types of handlers, described below. Any of these handlers can be used to perform startup, liveness and readiness checks.
ExecAction
The exec action, declared as exec:
, executes a specified command inside the container. The diagnostic is considered successful if the command exits with a status code of 0.
HTTPGetAction
Performs an HTTP GET request against the container’s IP address on a specified port and path. The diagnostic is considered successful if the response has a status code greater than or equal to 200 and less than 400
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 180
timeoutSeconds: 30
periodSeconds: 25
TCPSocketAction
Performs a TCP check against the container’s IP address on a specified port. The diagnostic is considered successful if the connection is successfully established, which indicates that the port is open.
Container and Pod Startup Check
The startup check is performed by a startup probe. Startup probes have been introduced in Kubernetes 1.16. The probe indicates whether the application within the container is started. If a startup probe is not provided, the default result is "success". If a startup probe is provided, all other probes are disabled until the startup probe succeeds. If the startup probe fails, the container is killed and it is subject to its restart policy.
Container and Pod Liveness Check
The liveness check is performed on a specific container by a liveness probe. The probe indicates whether the container is running. If a liveness probe is not provided, the default is "success". If a liveness probe is provided and it fails, the container will be killed and then subjected to its restart policy. Note that the pod may contain other containers. If the process in the container is able to crash on its own whenever it encounters an issue or becomes unhealthy, a liveness probe is not needed - the kubelet will automatically perform the correct action in accordance with the pod restart policy.
Clarify the relationship between the health of individual containers and the health of the pod. When does the pod transitions from Running phase in the Failed phase as a function of its containers' health?
Container and Pod Readiness Check
By definition, a container is "ready" when it can successfully respond to requests.
By default, containers IP address and port pairs (endpoints) are added to a service's Endpoints list and forwarded traffic to if the service selector matches the pod labels and the pod is "ready", meaning that all pod's containers that expose ports with the service are ready. Usually a pod exposes just one container, so "ready pod" and "ready container" are in this case equivalent. The situation when a pod exposes multiple containers is addressed in detail in the Readiness and Multiple Containers per Pod section.
The notion of being "ready" is something that is specific to each container. For example, in the initialization phase of the pod, its traffic-serving container may need time to load either configuration or data, or it may need to perform a warm-up procedure to prevent the first user request from taking too long and affecting user experience. The readiness probe should be designed in such a way that it start succeeding only after initialization. Container that serve load in production should always define a readiness probe. Readiness probe should not be used for orderly taking pods out of load. That should be done by either deleting the pod, or defining a label "enabled=true" or similar that can be switched on or off.
Playground
Readiness Probe Operations
If the container does not provide a readiness probe, the default diagnostic result is "success".
If a probe is declared, the default state of readiness before the initial delay is "failure".
Once the probe executes successfully after the initial delay and the container endpoint is added to the Endpoints instance, the corresponding readiness probe is invoked periodically and the endpoint stays in the Endpoints active addresses list as long as the probe executes successfully. The initial delay and probe timing arithmetic is explained in the Probe Template section.
If the readiness probe fails, the endpoint is removed from the list. Technically, the pod's IP address is not actually removed from Endpoints, but listed in the "notReadyAddresses" list. The endpoint does not, however, show up in 'kubectl get ep <name>' output, and the backing pod does not receive traffic anymore.
If the probe starts succeeding again, the pod is put back in traffic.
If there is more than one container per pod and at least one readiness probe fails, the endpoints for all containers in the pod are removed.
The pod's readiness state is displayed in the output of kubectl get pod command. A pod with one container displays a report similar to:
NAME READY STATUS RESTARTS AGE
cassandra-0 0/1 Running 0 23s
A pod with two containers displays a report similar to:
NAME READY STATUS RESTARTS AGE
httpd 1/2 Running 0 31m
Unlike a liveness probe, if a container fails the readiness check, it will not be killed or restarted.
Note that the container may put itself into a unready state regardless of whether the readiness probe exists. The Pod remains in the unready state while it waits for the containers in the pod to stop.
Readiness and Multiple Containers per Pod
If a pod defines multiple containers, each container may declare its own readiness probe. The pod is considered ready when all of its containers are ready. If at least one container is not ready, even if all others are ready, the pod will not count as "ready" and it will not be added to, or it will be removed from the service Endpoints.
publishNotReadyAddress
There are situations when a pod should be added to Endpoints even if it is not ready. A typical situation involves a container that runs a JVM that exposes the main service port and also a debug port. Normally, the ports are not exposed until the container's readiness probe passes, and that most likely involves a full boot process, so if debugging the boot process is not possible. The solution is to publish the container's endpoints anyway , as soon as the pod is started, and enable access to debugging, even if the main service is not ready yet.
This is configured with:
apiVersion: v1
kind: Service
spec:
...
publishNotReadyAddresses: true
publishNotReadyAddresses, when set to true, indicates that DNS implementations must publish the notReadyAddresses of subsets for the Endpoints associated with the Service. The default value is false. The primary use case for setting this field is to use a StatefulSet's Headless Service to propagate SRV records for its Pods without respect to their readiness for purpose of peer discovery.
Manual Readiness Probe Example
readinessProbe:
exec:
command:
- ls
- /tmp/ready
initialDelaySeconds: 1
periodSeconds: 1
successThreshold: 1
failureThreshold: 1
timeoutSeconds: 1
Probe Template
The probe templates are sub-trees in the pod manifest.
kind: Pod
spec:
containers:
- name: ...
readinessProbe|livenessProbe:
exec:
Example:
readinessProbe|livenessProbe:
exec:
command:
- /bin/sh
- -c
- nodetool status | grep -E "^UN\s+${POD_IP}"
initialDelaySeconds: 90
periodSeconds: 30
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5
Elements
Also see Readiness Probe Operations above.
initialDelaySeconds
Specifies the number of seconds after the container has started before the probe is executed for the first time. After the initial delay, the probe is invoked periodically, with a periodicity of periodSeconds seconds.
periodSeconds
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1. If the probe executes successfully, the next invocation will be executed in periodSeconds seconds.
timeoutSeconds
Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1. What happens on timeout?
failureThreshold
Minimum consecutive failures for the probe to be considered failed after having succeeded. Defaults to 3. Minimum value is 1.
successThreshold
Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1.