OpenShift Installation Validation: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(31 intermediate revisions by the same user not shown)
Line 7: Line 7:
* [[OpenShift Operations#Subjects|OpenShift Operations]]
* [[OpenShift Operations#Subjects|OpenShift Operations]]
* [[OpenShift_3.5_Installation#Verifying_the_Installation|OpenShift 3.5 Installation]]
* [[OpenShift_3.5_Installation#Verifying_the_Installation|OpenShift 3.5 Installation]]
* [[OpenShift_3.6_Installation#Installation_Validation|OpenShift 3.6 Installation]]
* [[OpenShift_3.6_Installation#Base_Installation_Validation|OpenShift 3.6 Installation]]


=Connect to the Support Node=
=Connect to the Support Node=
Line 31: Line 31:
  kubernetes v1.5.2+43a9be4
  kubernetes v1.5.2+43a9be4
  etcd 3.1.0
  etcd 3.1.0
=Exported Filesystems=
On the support node run exportfs and make sure the following filesystems are exported:
exportfs
/nfs              192.168.122.0/255.255.255.0
/nfs/registry      <world>
/nfs/metrics        <world>
/nfs/logging        <world>
/nfs/logging-es-ops <world>
/nfs/etcd          <world>


=On Masters=
=On Masters=
Line 52: Line 64:
node2.openshift35.local        Ready                      17m      beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,env=app,kubernetes.io/hostname=node2.openshift35.local,logging-infra-fluentd=true,logging=true
node2.openshift35.local        Ready                      17m      beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,env=app,kubernetes.io/hostname=node2.openshift35.local,logging-infra-fluentd=true,logging=true
</pre>
</pre>
=Web Console=
At this point the web console should be exposed on the external interface.
{{External|https://master.openshift.novaordis.io/}}
Use the administrative user defined as part of your "identity provider" declaration.


=Verify etcd=
=Verify etcd=
Line 68: Line 72:
  [[etcdctl#list|etcdctl member list]]
  [[etcdctl#list|etcdctl member list]]


=Exported Filesystems=
Note that etcdctl2 should be used on OCP 3.7 onward.
 
On the support node run exportfs and make sure the following filesystems are exported:
exportfs
 
/nfs          192.168.122.0/255.255.255.0
/nfs/registry        <world>
/nfs/metrics        <world>
/nfs/logging        <world>
/nfs/logging-es-ops <world>
/nfs/etcd          <world>


=Docker Logs=
=Docker Logs=
Line 98: Line 92:
Make sure "--selinux-enabled" and "--insecure-registry 172.30.0.0/16" are present.
Make sure "--selinux-enabled" and "--insecure-registry 172.30.0.0/16" are present.


=Logging=
<font color=red>--insecure-registry does not seem to propagate, update /etc/sysconfig/docker manually on all docker nodes with '--insecure-registry 172.30.0.0/16'.</font>
 
=Master Web Console=
 
At this point the web console should be exposed on the external interface.
 
{{External|https://master.openshift.novaordis.io/}}
 
Use the administrative user defined as part of your "identity provider" declaration.
 
The API server should respond to curl:


==ElasticSearch==
curl -k https&#58;//master.openshift.novaordis.io/version
{
  "major": "1",
  "minor": "6",
  "gitVersion": "v1.6.1+5115d708d7",
  "gitCommit": "fff65cf",
  "gitTreeState": "clean",
  "buildDate": "2017-10-11T22:44:25Z",
  "goVersion": "go1.7.6",
  "compiler": "gc",
  "platform": "linux/amd64"
}


ElasticSearch should be deployed, running, and operational - logs must not contain errors:
curl -k https&#58;//master.openshift.novaordis.io/healthz
ok


oc project logging
=DNS=
oc get pods -l 'component=es'
oc logs -f logging-es-3fs5ghyo-3-88749


==fluentd==
Verify name resolution:


All nodes should run a fluentd pod, and the fluentd pods should be operational, their logs must not contain errors.
dig +short docker-registry.default.svc.cluster.local
172.30.53.178


oc project logging
from masters, infrastructure nodes and nodes.
oc get pods -l 'component=fluentd'
NAME                    READY    STATUS    RESTARTS  AGE
logging-fluentd-2r4rt  1/1      Running  0          49m
logging-fluentd-37d72  1/1      Running  0          35m
logging-fluentd-4ljkn  1/1      Running  0          3h
logging-fluentd-74l39  1/1      Running  0          3h
logging-fluentd-7l25h  1/1      Running  0          3h
logging-fluentd-sbh7r  1/1      Running  0          3h
logging-fluentd-w4shg  1/1      Running  0          39m


oc logs -f logging-fluentd-2r4rt
The answer must match the output of
...


==Kibana==
oc get -n default svc/docker-registry
NAME              CLUSTER-IP      EXTERNAL-IP  PORT(S)    AGE
docker-registry  172.30.53.178  <none>        5000/TCP  88d


ElasticSearch should be deployed, running, and operational - logs must not contain errors:
=MTU Size Verification=


oc project logging
<font color=red>TODO:</font> https://access.redhat.com/documentation/en-us/openshift_container_platform/3.7/html/day_two_operations_guide/day_two_environment_health_checks#day-two-guide-verifying_mtu
oc get pods -l 'component=kibana'
oc logs -f -c kibana-proxy logging-kibana-10-sb7sk
oc logs -f -c kibana logging-kibana-10-sb7sk


==The Logging Portal==
=Router Status=


The logging portal should be available:
oc -n default get deploymentconfigs/router
NAME      REVISION  DESIRED  CURRENT  TRIGGERED BY
router    5          1        1        config


https://kibana.apps.openshift.novaordis.io/
The values in the DESIRED and CURRENT columns should match the number of nodes hosts.


=Metrics=
Internal connectivity (both from master and a node):


  oc project openshift-infra
  curl -kv https://docker-registry.default.svc.cluster.local:5000/healthz
oc get pods
NAME                        READY    STATUS    RESTARTS  AGE
hawkular-cassandra-1-pgd97  1/1      Running  0          40m
hawkular-metrics-zl9n5      1/1      Running  0          40m
heapster-2ngln              1/1      Running  0          40m


https://hawkular-metrics.apps.openshift.novaordis.io/hawkular/metrics
=Registry Status=
 
oc -n default get deploymentconfigs/docker-registry
NAME              REVISION  DESIRED  CURRENT  TRIGGERED BY
docker-registry  1          1        1        config
 
==Registry Console==
 
{{External|https://registry-console-default.apps.openshift.novaordis.io/}}


=oadm Diagnostics=
=oadm Diagnostics=


{{Internal|Oadm diagnostics|oadm diagnostics}}
{{Internal|Oadm diagnostics|oadm diagnostics}}
=Per-project Validation=
==Logging Installation Validation==
Must be performed after [[OpenShift_Logging_Installation#Installation_During_the_Main_Procedure|logging installation and post-install configuration]]:
{{Internal|OpenShift_Logging_Installation#Installation_Validation|Loging Installation Validation}}
==Metrics Installation Validation==
Must be performed after [[OpenShift_Metrics_Installation#Installation_During_the_Main_Procedure|metrics installation and post-install configuration]]:
{{Internal|OpenShift_Metrics_Installation#Installation_Validation|Metrics Installation Validation}}
=Validation Resources=
* Day Two Operations Guide - Health Checks: https://access.redhat.com/documentation/en-us/openshift_container_platform/3.7/html/day_two_operations_guide/day_two_environment_health_checks

Latest revision as of 19:04, 7 February 2018

External

Internal

Connect to the Support Node

As "ansible":

On All Nodes

OpenShift Packages

ansible nodes -m shell -a "yum list installed | grep openshift"

The desired OpenShift version must be installed.

OpenShift Version

ansible nodes -m shell -a "/usr/bin/openshift version"

master1.local | SUCCESS | ...
openshift v3.5.5.26
kubernetes v1.5.2+43a9be4
etcd 3.1.0

Exported Filesystems

On the support node run exportfs and make sure the following filesystems are exported:

exportfs
/nfs          	    192.168.122.0/255.255.255.0
/nfs/registry       <world>
/nfs/metrics        <world>
/nfs/logging        <world>
/nfs/logging-es-ops <world>
/nfs/etcd           <world>

On Masters

On each master node, run as root:

oc get nodes --show-labels

Output example:

NAME                           STATUS                     AGE       LABELS
infranode1.openshift35.local   Ready                      17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,env=infra,kubernetes.io/hostname=infranode1.openshift35.local,logging-infra-fluentd=true,logging=true
infranode2.openshift35.local   Ready                      17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,env=infra,kubernetes.io/hostname=infranode2.openshift35.local,logging-infra-fluentd=true,logging=true
master1.openshift35.local      Ready,SchedulingDisabled   17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,kubernetes.io/hostname=master1.openshift35.local,logging-infra-fluentd=true,logging=true,openshift_schedulable=False
master2.openshift35.local      Ready,SchedulingDisabled   17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,kubernetes.io/hostname=master2.openshift35.local,logging-infra-fluentd=true,logging=true,openshift_schedulable=False
master3.openshift35.local      Ready,SchedulingDisabled   17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,kubernetes.io/hostname=master3.openshift35.local,logging-infra-fluentd=true,logging=true,openshift_schedulable=False
node1.openshift35.local        Ready                      17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,env=app,kubernetes.io/hostname=node1.openshift35.local,logging-infra-fluentd=true,logging=true
node2.openshift35.local        Ready                      17m       beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster=hadron,env=app,kubernetes.io/hostname=node2.openshift35.local,logging-infra-fluentd=true,logging=true

Verify etcd

On nodes that run etcd, as root:

etcdctl cluster-health
etcdctl member list

Note that etcdctl2 should be used on OCP 3.7 onward.

Docker Logs

Log into a few nodes and take a look at the docker logs:

journalctl -f -u docker

Docker Startup Paramenters

From the support/installation server, execute as "ansible":

ansible nodes -m shell -a "ps -ef | grep dockerd | grep -v grep"

Make sure "--selinux-enabled" and "--insecure-registry 172.30.0.0/16" are present.

--insecure-registry does not seem to propagate, update /etc/sysconfig/docker manually on all docker nodes with '--insecure-registry 172.30.0.0/16'.

Master Web Console

At this point the web console should be exposed on the external interface.

https://master.openshift.novaordis.io/

Use the administrative user defined as part of your "identity provider" declaration.

The API server should respond to curl:

curl -k https://master.openshift.novaordis.io/version
{
  "major": "1",
  "minor": "6",
  "gitVersion": "v1.6.1+5115d708d7",
  "gitCommit": "fff65cf",
  "gitTreeState": "clean",
  "buildDate": "2017-10-11T22:44:25Z",
  "goVersion": "go1.7.6",
  "compiler": "gc",
  "platform": "linux/amd64"
}
curl -k https://master.openshift.novaordis.io/healthz
ok

DNS

Verify name resolution:

dig +short docker-registry.default.svc.cluster.local
172.30.53.178

from masters, infrastructure nodes and nodes.

The answer must match the output of

oc get -n default svc/docker-registry
NAME              CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
docker-registry   172.30.53.178   <none>        5000/TCP   88d

MTU Size Verification

TODO: https://access.redhat.com/documentation/en-us/openshift_container_platform/3.7/html/day_two_operations_guide/day_two_environment_health_checks#day-two-guide-verifying_mtu

Router Status

oc -n default get deploymentconfigs/router
NAME      REVISION   DESIRED   CURRENT   TRIGGERED BY
router    5          1         1         config

The values in the DESIRED and CURRENT columns should match the number of nodes hosts.

Internal connectivity (both from master and a node):

curl -kv https://docker-registry.default.svc.cluster.local:5000/healthz

Registry Status

oc -n default get deploymentconfigs/docker-registry
NAME              REVISION   DESIRED   CURRENT   TRIGGERED BY
docker-registry   1          1         1         config

Registry Console

https://registry-console-default.apps.openshift.novaordis.io/

oadm Diagnostics

oadm diagnostics

Per-project Validation

Logging Installation Validation

Must be performed after logging installation and post-install configuration:

Loging Installation Validation

Metrics Installation Validation

Must be performed after metrics installation and post-install configuration:

Metrics Installation Validation

Validation Resources