OpenShift Installation Validation: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(28 intermediate revisions by the same user not shown)
Line 7: Line 7:
* [[OpenShift Operations#Subjects|OpenShift Operations]]
* [[OpenShift Operations#Subjects|OpenShift Operations]]
* [[OpenShift_3.5_Installation#Verifying_the_Installation|OpenShift 3.5 Installation]]
* [[OpenShift_3.5_Installation#Verifying_the_Installation|OpenShift 3.5 Installation]]
* [[OpenShift_3.6_Installation#Installation_Validation|OpenShift 3.6 Installation]]
* [[OpenShift_3.6_Installation#Base_Installation_Validation|OpenShift 3.6 Installation]]


=Connect to the Support Node=
=Connect to the Support Node=
Line 64: Line 64:
node2.openshift35.local        Ready                      17m      beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,env=app,kubernetes.io/hostname=node2.openshift35.local,logging-infra-fluentd=true,logging=true
node2.openshift35.local        Ready                      17m      beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,env=app,kubernetes.io/hostname=node2.openshift35.local,logging-infra-fluentd=true,logging=true
</pre>
</pre>
=Web Console=
At this point the web console should be exposed on the external interface.
{{External|https://master.openshift.novaordis.io/}}
Use the administrative user defined as part of your "identity provider" declaration.


=Verify etcd=
=Verify etcd=
Line 79: Line 71:
  [[etcdctl#cluster-health|etcdctl cluster-health]]
  [[etcdctl#cluster-health|etcdctl cluster-health]]
  [[etcdctl#list|etcdctl member list]]
  [[etcdctl#list|etcdctl member list]]
Note that etcdctl2 should be used on OCP 3.7 onward.


=Docker Logs=
=Docker Logs=
Line 98: Line 92:
Make sure "--selinux-enabled" and "--insecure-registry 172.30.0.0/16" are present.
Make sure "--selinux-enabled" and "--insecure-registry 172.30.0.0/16" are present.


=Logging=
<font color=red>--insecure-registry does not seem to propagate, update /etc/sysconfig/docker manually on all docker nodes with '--insecure-registry 172.30.0.0/16'.</font>
 
=Master Web Console=
 
At this point the web console should be exposed on the external interface.
 
{{External|https://master.openshift.novaordis.io/}}
 
Use the administrative user defined as part of your "identity provider" declaration.
 
The API server should respond to curl:
 
curl -k https&#58;//master.openshift.novaordis.io/version
{
  "major": "1",
  "minor": "6",
  "gitVersion": "v1.6.1+5115d708d7",
  "gitCommit": "fff65cf",
  "gitTreeState": "clean",
  "buildDate": "2017-10-11T22:44:25Z",
  "goVersion": "go1.7.6",
  "compiler": "gc",
  "platform": "linux/amd64"
}


==ElasticSearch==
curl -k https&#58;//master.openshift.novaordis.io/healthz
ok


ElasticSearch should be deployed, running, and operational - logs must not contain errors:
=DNS=


oc project logging
Verify name resolution:
oc get pods -l 'component=es'
oc logs -f logging-es-3fs5ghyo-3-88749


==fluentd==
dig +short docker-registry.default.svc.cluster.local
172.30.53.178


All nodes should run a fluentd pod, and the fluentd pods should be operational, their logs must not contain errors.
from masters, infrastructure nodes and nodes.


oc project logging
The answer must match the output of
oc get pods -l 'component=fluentd'
NAME                    READY    STATUS    RESTARTS  AGE
logging-fluentd-2r4rt  1/1      Running  0          49m
logging-fluentd-37d72  1/1      Running  0          35m
logging-fluentd-4ljkn  1/1      Running  0          3h
logging-fluentd-74l39  1/1      Running  0          3h
logging-fluentd-7l25h  1/1      Running  0          3h
logging-fluentd-sbh7r  1/1      Running  0          3h
logging-fluentd-w4shg  1/1      Running  0          39m


  oc logs -f logging-fluentd-2r4rt
  oc get -n default svc/docker-registry
  ...
NAME              CLUSTER-IP      EXTERNAL-IP  PORT(S)    AGE
  docker-registry  172.30.53.178  <none>        5000/TCP  88d


==Kibana==
=MTU Size Verification=


ElasticSearch should be deployed, running, and operational - logs must not contain errors:
<font color=red>TODO:</font> https://access.redhat.com/documentation/en-us/openshift_container_platform/3.7/html/day_two_operations_guide/day_two_environment_health_checks#day-two-guide-verifying_mtu


oc project logging
=Router Status=
oc get pods -l 'component=kibana'
oc logs -f -c kibana-proxy logging-kibana-10-sb7sk
oc logs -f -c kibana logging-kibana-10-sb7sk


==The Logging Portal==
oc -n default get deploymentconfigs/router
NAME      REVISION  DESIRED  CURRENT  TRIGGERED BY
router    5          1        1        config


The logging portal should be available:
The values in the DESIRED and CURRENT columns should match the number of nodes hosts.


https://kibana.apps.openshift.novaordis.io/
Internal connectivity (both from master and a node):


=Metrics=
curl -kv https://docker-registry.default.svc.cluster.local:5000/healthz


oc project openshift-infra
=Registry Status=
oc get pods
NAME                        READY    STATUS    RESTARTS  AGE
hawkular-cassandra-1-pgd97  1/1      Running  0          40m
hawkular-metrics-zl9n5      1/1      Running  0          40m
heapster-2ngln              1/1      Running  0          40m


https://hawkular-metrics.apps.openshift.novaordis.io/hawkular/metrics
oc -n default get deploymentconfigs/docker-registry
NAME              REVISION  DESIRED  CURRENT  TRIGGERED BY
docker-registry  1          1        1        config
 
==Registry Console==
 
{{External|https://registry-console-default.apps.openshift.novaordis.io/}}


=oadm Diagnostics=
=oadm Diagnostics=


{{Internal|Oadm diagnostics|oadm diagnostics}}
{{Internal|Oadm diagnostics|oadm diagnostics}}
=Per-project Validation=
==Logging Installation Validation==
Must be performed after [[OpenShift_Logging_Installation#Installation_During_the_Main_Procedure|logging installation and post-install configuration]]:
{{Internal|OpenShift_Logging_Installation#Installation_Validation|Loging Installation Validation}}
==Metrics Installation Validation==
Must be performed after [[OpenShift_Metrics_Installation#Installation_During_the_Main_Procedure|metrics installation and post-install configuration]]:
{{Internal|OpenShift_Metrics_Installation#Installation_Validation|Metrics Installation Validation}}
=Validation Resources=
* Day Two Operations Guide - Health Checks: https://access.redhat.com/documentation/en-us/openshift_container_platform/3.7/html/day_two_operations_guide/day_two_environment_health_checks

Latest revision as of 19:04, 7 February 2018

External

Internal

Connect to the Support Node

As "ansible":

On All Nodes

OpenShift Packages

ansible nodes -m shell -a "yum list installed | grep openshift"

The desired OpenShift version must be installed.

OpenShift Version

ansible nodes -m shell -a "/usr/bin/openshift version"

master1.local | SUCCESS | ...
openshift v3.5.5.26
kubernetes v1.5.2+43a9be4
etcd 3.1.0

Exported Filesystems

On the support node run exportfs and make sure the following filesystems are exported:

exportfs
/nfs          	    192.168.122.0/255.255.255.0
/nfs/registry       <world>
/nfs/metrics        <world>
/nfs/logging        <world>
/nfs/logging-es-ops <world>
/nfs/etcd           <world>

On Masters

On each master node, run as root:

oc get nodes --show-labels

Output example:

NAME                           STATUS                     AGE       LABELS
infranode1.openshift35.local   Ready                      17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,env=infra,kubernetes.io/hostname=infranode1.openshift35.local,logging-infra-fluentd=true,logging=true
infranode2.openshift35.local   Ready                      17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,env=infra,kubernetes.io/hostname=infranode2.openshift35.local,logging-infra-fluentd=true,logging=true
master1.openshift35.local      Ready,SchedulingDisabled   17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,kubernetes.io/hostname=master1.openshift35.local,logging-infra-fluentd=true,logging=true,openshift_schedulable=False
master2.openshift35.local      Ready,SchedulingDisabled   17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,kubernetes.io/hostname=master2.openshift35.local,logging-infra-fluentd=true,logging=true,openshift_schedulable=False
master3.openshift35.local      Ready,SchedulingDisabled   17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,kubernetes.io/hostname=master3.openshift35.local,logging-infra-fluentd=true,logging=true,openshift_schedulable=False
node1.openshift35.local        Ready                      17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,env=app,kubernetes.io/hostname=node1.openshift35.local,logging-infra-fluentd=true,logging=true
node2.openshift35.local        Ready                      17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,env=app,kubernetes.io/hostname=node2.openshift35.local,logging-infra-fluentd=true,logging=true

Verify etcd

On nodes that run etcd, as root:

etcdctl cluster-health
etcdctl member list

Note that etcdctl2 should be used on OCP 3.7 onward.

Docker Logs

Log into a few nodes and take a look at the docker logs:

journalctl -f -u docker

Docker Startup Paramenters

From the support/installation server, execute as "ansible":

ansible nodes -m shell -a "ps -ef | grep dockerd | grep -v grep"

Make sure "--selinux-enabled" and "--insecure-registry 172.30.0.0/16" are present.

--insecure-registry does not seem to propagate, update /etc/sysconfig/docker manually on all docker nodes with '--insecure-registry 172.30.0.0/16'.

Master Web Console

At this point the web console should be exposed on the external interface.

https://master.openshift.novaordis.io/

Use the administrative user defined as part of your "identity provider" declaration.

The API server should respond to curl:

curl -k https://master.openshift.novaordis.io/version
{
  "major": "1",
  "minor": "6",
  "gitVersion": "v1.6.1+5115d708d7",
  "gitCommit": "fff65cf",
  "gitTreeState": "clean",
  "buildDate": "2017-10-11T22:44:25Z",
  "goVersion": "go1.7.6",
  "compiler": "gc",
  "platform": "linux/amd64"
}
curl -k https://master.openshift.novaordis.io/healthz
ok

DNS

Verify name resolution:

dig +short docker-registry.default.svc.cluster.local
172.30.53.178

from masters, infrastructure nodes and nodes.

The answer must match the output of

oc get -n default svc/docker-registry
NAME              CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
docker-registry   172.30.53.178   <none>        5000/TCP   88d

MTU Size Verification

TODO: https://access.redhat.com/documentation/en-us/openshift_container_platform/3.7/html/day_two_operations_guide/day_two_environment_health_checks#day-two-guide-verifying_mtu

Router Status

oc -n default get deploymentconfigs/router
NAME      REVISION   DESIRED   CURRENT   TRIGGERED BY
router    5          1         1         config

The values in the DESIRED and CURRENT columns should match the number of nodes hosts.

Internal connectivity (both from master and a node):

curl -kv https://docker-registry.default.svc.cluster.local:5000/healthz

Registry Status

oc -n default get deploymentconfigs/docker-registry
NAME              REVISION   DESIRED   CURRENT   TRIGGERED BY
docker-registry   1          1         1         config

Registry Console

https://registry-console-default.apps.openshift.novaordis.io/

oadm Diagnostics

oadm diagnostics

Per-project Validation

Logging Installation Validation

Must be performed after logging installation and post-install configuration:

Loging Installation Validation

Metrics Installation Validation

Must be performed after metrics installation and post-install configuration:

Metrics Installation Validation

Validation Resources