OpenShift Container Probes: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
No edit summary
 
(165 intermediate revisions by the same user not shown)
Line 1: Line 1:
=External=
=<span id='Pod_Probe'></span>Container Probe=


* https://blog.openshift.com/kubernetes-pods-life/
* https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/


=Internal=
Users can configure ''container probes'' for liveness or readiness. Sometimes they are referred as "pod probes", but they are configured at container-level, not pod-level. Each container can have its own probe set, which are exercised, and return results, independently. They are specified in the [[OpenShift Pod Definition#example_containers|pod template]].


* [[OpenShift Concepts#Pod|OpenShift Concepts]]
A probe is executed periodically by Kubernetes, and consists in a diagnostic on the container, which may have one of the following results: Success, which means the container passed the diagnostic, Failure, meaning that the container failed the diagnostic and Unknown, which means the diagnostic execution itself failed and no action should be taken.


A pod is a set of one or more [[Docker Concepts#Container|containers]], deployed together onto a [[#Node|node]], as a single unit.
==Liveness Probe==


The defining characteristic of a pod is that all its containers share a virtual network device - an unique IP -, and a set of [[#Persistent_Volume|persistent volumes]]. Pods also define the security and runtime policy for each container.
A ''liveness probe'' indicates whether the container is running. If the liveness probe fails, Kubernetes kills the container, and the container is subjected to its [[#Container_Restart_Policy|restart policy]], as described in [[OpenShift_Pod_Concepts#Liveness_Probe_Failure|Liveness Probe Failure]]. If a container does not provide a liveness probe, the liveness diagnostic is considered successful by default.


The "pod" is a Kubernetes concept, for more details, see [[Kubernetes Concepts#Pod|Kubernetes Pods]]. Each pod gets a [[#Pod_IP_Address|pod IP address]] that is routable by default from any other pod in the environment. The default addresses are part of the 10.x.x.x set. The containers on a pod share the IP address and TCP ports, because they share the pod's virtual network device. They also share persistent storage [[#Volume|volumes]], and other resources allocated to the pod. The pod contains collocated applications that are relatively tightly coupled and run with a ''shared context''. Within that context, an application may have individual [[Linux cgroups|cgroups]] isolation applied. A pod models an application-specific ''logical host'', containing applications that in a pre-container world would have run on the same physical or virtual host, and in consequence, the pod cannot span hosts. The pod is the ''smallest unit'' that can be defined, deployed and managed by OpenShift. Complex applications can be made of any number of pods, and OpenShift helps with pod orchestration. Pods do not maintain state, they are expendable.
The following sequence should go in the  [[OpenShift Pod Definition#example_containers|container declaration from the pod template]], at the same level as "name":


Pods must not created or managed directly, but by their [[#Controller|controllers]], which are specified in the pod description.
livenessProbe:
 
  <span id='livenessProbe_initialDelaySeconds'></span>[[#Probe_Operations|initialDelaySeconds]]: 30
  <span id='livenessProbe_timeoutSeconds'></span>[[#Probe_Operations|timeoutSeconds]]: 1
  <span id='livenessProbe_successThreshold'></span>[[#Probe_Operations|successThreshold]]: 1
  <span id='livenessProbe_failureThreshold'></span>[[#Probe_Operations|failureThreshold]]: 3
  <span id='livenessProbe_periodSeconds'></span>periodSeconds: 10
  tcpSocket:
      port: 5432


OpenShift treats pods as largely immutable - changes cannot be made to a pod definition while the pod is running - and expendable, they do not maintain state when they are destroyed and recreated. Therefore, they are managed by controllers, not directly by users.
==Readiness Probe==


{{Internal|OpenShift Pod Definition|Pod Definition}}
A ''readiness probe'' is deployed in a container to expose whether the container is ready to service requests. If a container does not provide a readiness probe, the readiness state after creation is by default "Success". On readiness probe failure, Kubernetes will stop sending traffic into that specific pod, by removing the corresponding endpoint form the service, as described in the [[OpenShift_Pod_Concepts#Readiness_Probe_Failure|readiness probe failure]] section. <Font color=red>What about router?</font>. A readiness probe is useful when we want to automatically stop sending traffic if a pod enters an unstable state, and resume sending traffic into it if, and when it recovers. This could also be used in implementing a mechanism to allow taking the container down for maintenance. Note that if you just want to be able to drain requests when the pod is deleted, you do not necessarily need a readiness probe; on deletion, the pod automatically puts itself into an unready state regardless of whether the readiness probe exists. The pod remains in the unready state while it waits for the containers in the pod to stop.


The pods for a project are displayed by the following commands:
The following sequence should go in the  [[OpenShift Pod Definition#example_containers|container declaration from the pod template]], at the same level as "name":
[[oc get#all|oc get all]]
[[Oc_get#pods.2C_po|oc get pods]]
 
Pods for a project can also be viewed in the web console to the project -> Applications -> Pods.
 
A pod executing a container based on a simple image, suited for experimentation, can be created as described here: "[[OpenShift Simple Pod Running inside an OpenShift Project|Simple Pod Running inside an OpenShift Project]]".
 
==Controller==
 
A controller is the OpenShift component that creates and manages [[#Pod|pods]]. The controller of a pod is reported by [[oc describe#pod|oc describe pod]] command, under the "Controllers" section:
 
  ...
Controllers: ReplicationController/logging-kibana-1
...
 
The most common controllers are:
* [[#Replication_Controller|Replication Controllers]]
* <span id='DaemonSet'></span>[[OpenShift DaemonSet Concepts#Overview|DaemonSets]]
 
==Pod Configuration==
 
Pods are treated as ''static'', and cannot be changed while they are running. To change a pod, the current pod must be terminated, and a new one with a modified base image and/or configuration must be created.
 
==Pod Lifecycle==
 
* A pod is defined in a [[#Pod_Definition|pod definition]].
* A pod is instantiated and assigned to run on a node as a result of the [[#Scheduler|scheduling process]].
* The pod runs until its containers exit or the pod is removed.
* Depending on policy and exit code, may be removed or retained to enable access to their container's logs.
 
===Terminal State===
 
A pod is in a terminal state if "status.phase" is either "Failed" or "Succeeded".
 
==Pod Definition==
 
The definition of an already existing pod can be obtained with [[oc describe#pod|oc describe pod]].
 
===Pod Name===
 
Pod must have an unique name in their namespace (project). The pod definition can specify a base name and use "generateName" attribute to append random characters at the end of the base name, thus generating an unique name.
 
===Pod Definition File===
 
{{Internal|Pod Definition File|Pod Definition File}}
 
==Pod Placement==
 
{{External|https://docs.openshift.com/container-platform/3.5/admin_guide/scheduler.html#controlling-pod-placement}}
 
Pods can be configured to execute on a specific node, defined by the node name, or on nodes that match a specific [[#Node_Selector|node selector]].
 
To assign a pod to a ''specific node'', <font color=red>TODO https://docs.openshift.com/container-platform/3.5/admin_guide/scheduler.html#constraining-pod-placement-labels</font>
 
To assign a pod to ''nodes that match a node selector'', add the "nodeSelector" element in the pod configuration, with a value consisting in key/value pairs, as described here:
 
{{Internal|OpenShift_Deployment_Operations#Assigning_a_Pod_to_Nodes_that_Match_a_Node_Selector|Assigning a Pod to Nodes that Match a Node Selector}}
 
After a successful placement, either by a replication controller or by a DaemonSet, the pod <span id='successful_node_selector_recorded'></span>records the successful node selector expression as part of its definition, which can be rendered with oc get pod -o yaml:
 
spec:
  ...
  nodeSelector:
    logging: "true"
  ...
 
 
<font color=red>Consolidate with [[OpenShift_Concepts#Node_Selector]]</font>
 
==Pod Probe==
 
{{External|https://docs.openshift.com/container-platform/latest/dev_guide/application_health.html}}
 
Users can configure ''pod probes'' for liveness or readiness. Can be configured with:
* initialDelaySeconds
* timeoutSeconds (default 1)
 
===Liveness Probe===
 
A ''liveness probe'' is deployed in a container to expose whether the container is running. Examples of liveness probes: commands executed inside the container, tcpSocket.
 
livenessProbe: {
    initialDelaySeconds: 30,
    timeoutSeconds: 1
    periodSeconds: 10,
    failureThreshold: 3,
    successThreshold: 1,
    tcpSocket: {
        port: 5432
    },
  }
 
{{Internal|OpenShift Application Health Operations|Application Health Operations}}
 
===Readiness Probe===
 
A ''readiness probe'' is deployed in a container to expose whether the container is ready to service requests.
 
Readines probes: httpGet.


  readinessProbe:
  readinessProbe:
   initialDelaySeconds: 5
   timeoutSeconds: 1
   <span id='readinessProbe_initialDelaySeconds'></span>[[#readiness_probe_work|initialDelaySeconds]]: 5
   periodSeconds: 10
   <span id='readinessProbe_timeoutSeconds'></span>[[#readiness_probe_work|timeoutSeconds]]: 1
   failureThreshold: 3
   <span id='readinessProbe_successThreshold'></span>[[#readiness_probe_work|successThreshold]]: 1
   successThreshold: 1
   <span id='readinessProbe_failureThreshold'></span>[[#readiness_probe_work|failureThreshold]]: 3
   exec:
   <span id='readinessProbe_periodSeconds'></span>periodSeconds: 10
   <span id='readinessProbe_exec'></span>exec:
     command:
     command:
     - /bin/sh
     - /bin/sh
Line 134: Line 44:
     - psql -h 127.0.0.1 -U $POSTGRESQL_USER -q -d $POSTGRESQL_DATABASE -c 'SELECT 1'
     - psql -h 127.0.0.1 -U $POSTGRESQL_USER -q -d $POSTGRESQL_DATABASE -c 'SELECT 1'


{{Internal|OpenShift Application Health Operations|Application Health Operations}}
==Probe Operations==
 
==Local Manifest Pod==


{{External|https://docs.openshift.com/container-platform/latest/install_config/master_node_configuration.html#node-configuration-files}}
<span id='readiness_probe_work'></span>After the container is started, Kubernetes waits for [[#readinessProbe_initialDelaySeconds|initialDelaySeconds]], specified in seconds, then it triggers the execution of the probe specified by "[[#readinessProbe_exec|exec]]", "httpGet", "tcpSocket", etc. Once the probe execution is started, Kubernetes waits for [[#readinessProbe_timeoutSeconds|timeoutSeconds]] (default 1 second) for the probe execution to complete.


==Bare Pod==
If the probe execution is successful, the success counts towards the [[#successThreshold|successThreshold_initialDelaySeconds]]. A total number of consecutive successful execution specified in [[#readinessProbe_successThreshold|successThreshold]] must be counted ''after a failure'', for the container to be considered as passing the probe. For liveness probes, this value must be 1. The default value is 1.


{{External|https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#bare-pods}}
If the probe does not complete within [[#readinessProbe_timeoutSeconds|timeoutSeconds]] seconds or it explicitly fails, the failure counts towards the [[#readinessProbe_failureThreshold|failureThreshold]]. A total number of ''successive'' failed execution specified in [[#readinessProbe_failureThreshold|failureThreshold]] must be counted before the container to be considered as failing the probe.


A pod that is not backed by a [[#Replication_Controller|replication controller]]. Bare pods cannot be evacuated from nodes.
The probe is executed periodically with a periodicity of [[#readinessProbe_periodSeconds|periodSeconds]].


==Static Pod==
===Liveness Probe Failure===


{{External|https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#static-pods}}
If the liveness probe fails, Kubernetes kills the container and the container is subjected to its [[#Container_Restart_Policy|restart policy]]. A liveness probe that fails occasionally is indicated by the number of restarts:


==Pod Type==
NAME                  READY    STATUS    '''RESTARTS'''  AGE
rest-service-1-9p9hj  1/1      Running  '''3'''          1m


===Terminating===
Note that a pod will maintain its name after a restart.


A ''terminating'' pod has non-zero positive integer as value of "spec.activeDeadlineSeconds". Builder or deployer pods are terminating pods. The pod type can be specified as [[OpenShift_Resource_Management_Concepts#Quota_Scope|scope]] for resource quotas.
If the liveness probe fails consistently, the pod enters a crash loop backoff state <font color=red>What is exactly the condition that makes it go from "Running" to "CrashLoopBackOff"?</font>:


===NonTerminating===
NAME                  READY    '''STATUS'''            RESTARTS  AGE
rest-service-1-9p9hj  0/1      '''CrashLoopBackOff'''  5          3m


A ''non-terminating'' pod has no "spec.activeDeadlineSeconds" specification (nil). Long running pods as a web server or a database are non-terminating pods. The pod type can be specified as [[OpenShift_Resource_Management_Concepts#Quota_Scope|scope]] for resource quotas.
===Readiness Probe Failure===


==Pod Presets==
If the readiness probe fails, the [[OpenShift_Service_Concepts#EndpointsController|EndpointsController]] removes the Pod’s IP address from the endpoints of all Services that match the Pod. The service will still exist, but it'll list less endpoints. If the service is backed by one-replica pod, it'll have zero endpoints.


{{External|https://docs.openshift.com/container-platform/latest/dev_guide/pod_preset.html}}
The container will still show in a [[#Running|Running]] [[#phase|phase]] (status), but it will not be "READY".


==Init Container==
NAME                      READY    STATUS    RESTARTS  AGE
po/rest-service-3-bm1t9    '''0/1'''      Running  0          2m


{{Internal|OpenShift Init Container|Init Container}}
Note that if the pod "heals" - the readiness probe starts passing after the configured number of successful run reaches successThreshold, the endpoint is re-attached to the service, automatically.

Latest revision as of 00:39, 1 November 2019

Container Probe

Users can configure container probes for liveness or readiness. Sometimes they are referred as "pod probes", but they are configured at container-level, not pod-level. Each container can have its own probe set, which are exercised, and return results, independently. They are specified in the pod template.

A probe is executed periodically by Kubernetes, and consists in a diagnostic on the container, which may have one of the following results: Success, which means the container passed the diagnostic, Failure, meaning that the container failed the diagnostic and Unknown, which means the diagnostic execution itself failed and no action should be taken.

Liveness Probe

A liveness probe indicates whether the container is running. If the liveness probe fails, Kubernetes kills the container, and the container is subjected to its restart policy, as described in Liveness Probe Failure. If a container does not provide a liveness probe, the liveness diagnostic is considered successful by default.

The following sequence should go in the container declaration from the pod template, at the same level as "name":

livenessProbe: 
 
  initialDelaySeconds: 30
  timeoutSeconds: 1
  successThreshold: 1
  failureThreshold: 3
  periodSeconds: 10

  tcpSocket: 
      port: 5432

Readiness Probe

A readiness probe is deployed in a container to expose whether the container is ready to service requests. If a container does not provide a readiness probe, the readiness state after creation is by default "Success". On readiness probe failure, Kubernetes will stop sending traffic into that specific pod, by removing the corresponding endpoint form the service, as described in the readiness probe failure section. What about router?. A readiness probe is useful when we want to automatically stop sending traffic if a pod enters an unstable state, and resume sending traffic into it if, and when it recovers. This could also be used in implementing a mechanism to allow taking the container down for maintenance. Note that if you just want to be able to drain requests when the pod is deleted, you do not necessarily need a readiness probe; on deletion, the pod automatically puts itself into an unready state regardless of whether the readiness probe exists. The pod remains in the unready state while it waits for the containers in the pod to stop.

The following sequence should go in the container declaration from the pod template, at the same level as "name":

readinessProbe:

  initialDelaySeconds: 5
  timeoutSeconds: 1
  successThreshold: 1
  failureThreshold: 3
  periodSeconds: 10

  exec:
   command:
    - /bin/sh
    - -i
    - -c
    - psql -h 127.0.0.1 -U $POSTGRESQL_USER -q -d $POSTGRESQL_DATABASE -c 'SELECT 1'

Probe Operations

After the container is started, Kubernetes waits for initialDelaySeconds, specified in seconds, then it triggers the execution of the probe specified by "exec", "httpGet", "tcpSocket", etc. Once the probe execution is started, Kubernetes waits for timeoutSeconds (default 1 second) for the probe execution to complete.

If the probe execution is successful, the success counts towards the successThreshold_initialDelaySeconds. A total number of consecutive successful execution specified in successThreshold must be counted after a failure, for the container to be considered as passing the probe. For liveness probes, this value must be 1. The default value is 1.

If the probe does not complete within timeoutSeconds seconds or it explicitly fails, the failure counts towards the failureThreshold. A total number of successive failed execution specified in failureThreshold must be counted before the container to be considered as failing the probe.

The probe is executed periodically with a periodicity of periodSeconds.

Liveness Probe Failure

If the liveness probe fails, Kubernetes kills the container and the container is subjected to its restart policy. A liveness probe that fails occasionally is indicated by the number of restarts:

NAME                   READY     STATUS    RESTARTS   AGE 
rest-service-1-9p9hj   1/1       Running   3          1m

Note that a pod will maintain its name after a restart.

If the liveness probe fails consistently, the pod enters a crash loop backoff state What is exactly the condition that makes it go from "Running" to "CrashLoopBackOff"?:

NAME                   READY     STATUS             RESTARTS   AGE
rest-service-1-9p9hj   0/1       CrashLoopBackOff   5          3m

Readiness Probe Failure

If the readiness probe fails, the EndpointsController removes the Pod’s IP address from the endpoints of all Services that match the Pod. The service will still exist, but it'll list less endpoints. If the service is backed by one-replica pod, it'll have zero endpoints.

The container will still show in a Running phase (status), but it will not be "READY".

NAME                       READY     STATUS    RESTARTS   AGE
po/rest-service-3-bm1t9    0/1       Running   0          2m

Note that if the pod "heals" - the readiness probe starts passing after the configured number of successful run reaches successThreshold, the endpoint is re-attached to the service, automatically.