OpenShift 3.6 Installation

From NovaOrdis Knowledge Base
Revision as of 10:28, 11 November 2017 by Ovidiu (talk | contribs) (→‎Post-Install)
Jump to navigation Jump to search

External

Internal

Overview

This document covers OpenShit Container Platform 3.6 advanced installation procedure. The procedure it is largely based on https://docs.openshift.com/container-platform/3.6/install_config/index.html and contains additional details observed during an actual installation process on KVM guests.

Location

noper430:/root/environments/ocp36
RackStation:/base/NovaOrdis/Archive/yyyy.mm.dd-ocp36

Diagram

OpenShift 3.6 Installation Diagram

Hardware Requirements

https://docs.openshift.com/container-platform/3.6/install_config/install/prerequisites.html#hardware

System and Environment Prerequisites

The pre-requisites and all preparatory steps presented in the following link are also addressed in this procedure. The links are provided for reference only:

https://docs.openshift.com/container-platform/3.6/install_config/install/prerequisites.html
https://docs.openshift.com/container-platform/3.6/install_config/install/host_preparation.html

Virtualization Host Preparation

Guest Template Preparation

Create VM templates, in this order:

Cloud Provider-Specific Configuration

Guest Configuration

External DNS Setup

An external DNS server is required. Controlling a registrar (such as http://godaddy.com) DNS zone will provide all capabilities needed to set up external DNS address resolution. A valid alternative, though more complex, is to install a dedicated public DNS server. The DNS server must be available to all OpenShift environment nodes, and also to external clients than need to resolve public names such as the master public web console and API URL, the public wildcard name of the environment, the application router public DNS name, etc. A combination of a public DNS server that resolves public addresses and an internal DNS server, deployed as part of the environment, usually on the support node, tasked with resolving internal addresses is also a workable solution. This installation procedure includes installation of the bind DNS server on the support node - see bind DNS server installation section below.

Wildcard Domain for Application Traffic

The DNS server that resolves the public addresses will need to be capable to support wildcard sub-domains and resolve the public wildcard DNS entry to the public IP address of the node that executes the default router, or of the public ingress node that proxies to the default router. If the environment has multiple routers, an external load balancer is required, and the wildcard record must contain the public IP address of the host that runs the load balancer. The name of the wildcard domain will be later specified during the advanced installation procedure in the Ansible inventory file as openshift_master_default_subdomain.

Configuration procedures:

OpenShift Advanced Installation

https://docs.openshift.com/container-platform/3.6/install_config/install/advanced_install.html#install-config-install-advanced-install

The Support Node

Execute the installation as "ansible" from the support node.

As "root" on "support":

cd /usr/share/ansible
chgrp -R ansible /usr/share/ansible
chmod -R g+w /usr/share/ansible
chgrp ansible /etc/ansible/
chmod g+w /etc/ansible/

The support node needs at least 1 GB or RAM to run the installation process.

Create an ansible key pair on "support" and disseminate it to all other nodes. As "ansible":

cd
ssh-keygen -q -b 2048 -f ~/.ssh/id_rsa -t rsa

Configure Ansible Inventory File

OpenShift Ansible hosts Inventory File

Pre-Flight

On the support node, as "root":

cd /tmp
rm -r tmp* yum*
rm /usr/share/ansible/openshift-ansible/playbooks/byo/config.retry

As "ansible":

cd /etc/ansible
ansible all -m ping
ansible nodes -m shell -a "systemctl status docker"
ansible nodes -m shell -a "docker version"
ansible nodes -m shell -a "pvs"
ansible nodes -m shell -a "vgs"
ansible nodes -m shell -a "lvs"
ansible nodes -m shell -a "nslookup something.apps.openshift.novaordis.io"
ansible nodes -m shell -a "mkdir /mnt/tmp"
ansible nodes -m shell -a "mount -t nfs support.ocp36.local:/nfs /mnt/tmp"
ansible nodes -m shell -a "hostname > /mnt/tmp/\$(hostname).txt"
ansible nodes -m shell -a "umount /mnt/tmp"
for i in /nfs/*.ocp36.local.txt; do echo $i; cat $i; done

Snapshot the Guest Images for the Entire Environment

On the virtualization host:

virsh vol-list main-storage-pool --details
./stop-ose36
virsh list --all
qemu-img snapshot
for i in \
 /main-storage-pool/ocp36.ingress.qcow2 \
 /main-storage-pool/ocp36.support.qcow2 \
 /main-storage-pool/ocp36.master.qcow2 \
 /main-storage-pool/ocp36.infranode.qcow2 \
 /main-storage-pool/ocp36.node1.qcow2 \
 /main-storage-pool/ocp36.node2.qcow2; \
 do \
   echo "$i snapshots:"; \
   qemu-img snapshot -l $i; \
 done
for i in \
 /main-storage-pool/ocp36.ingress.qcow2 \
 /main-storage-pool/ocp36.support.qcow2 \
 /main-storage-pool/ocp36.master.qcow2 \
 /main-storage-pool/ocp36.infranode.qcow2 \
 /main-storage-pool/ocp36.node1.qcow2 \
 /main-storage-pool/ocp36.node2.qcow2; \
 do \
   echo "taking snapshot of $i"; \
   qemu-img snapshot -c before_ansible_installation $i; \
 done
./start-ose36

Reverting the Environment to a Snapshot State

./stop-ose36
for i in \
 /main-storage-pool/ocp36.ingress.qcow2 \
 /main-storage-pool/ocp36.support.qcow2 \
 /main-storage-pool/ocp36.master.qcow2 \
 /main-storage-pool/ocp36.infranode.qcow2 \
 /main-storage-pool/ocp36.node1.qcow2 \
 /main-storage-pool/ocp36.node2.qcow2; \
 do \
   echo "reverting $i to snapshot"; \
   qemu-img snapshot -a before_ansible_installation $i; \
 done

Run the Advanced Installation

May want to run the pre-flight checks one more time after reboot.

As "ansible" on support:

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml

For verbose installation use -vvv or -vvvv.

ansible-playbook -vvvv /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml > ansible.out

To use a different inventory file than /etc/ansible/hosts, run:

ansible-playbook -vvvv -i /custom/path/to/inventory/file /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml > ansible.out

Output of a Successful Run

PLAY RECAP *********************************************************************************************************************************************************************************************************************************************************************************************************************************
api-lb.ocp36.local         : ok=97   changed=2    unreachable=0    failed=0
infranode.ocp36.local      : ok=257  changed=58   unreachable=0    failed=0
localhost                  : ok=14   changed=0    unreachable=0    failed=0
master.ocp36.local         : ok=1039 changed=290  unreachable=0    failed=0
node1.ocp36.local          : ok=256  changed=58   unreachable=0    failed=0
node2.ocp36.local          : ok=256  changed=58   unreachable=0    failed=0
support.ocp36.local        : ok=100  changed=3    unreachable=0    failed=0

Uninstallation

In case the installation procedure runs into problems, troubleshoot and before re-starting the installation procedure, uninstall:

ansible-playbook [-i /custom/path/to/inventory/file] /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml

This a relative quick method to iterate over the installation configuration and come up with a stable configuration. However, in order to install a production deployment, you must start from a clean operating system installation. If you are using virtual machines, start from a fresh image. If you are using bare metal machines, run the following on all hosts:

yum -y remove openshift openshift-* etcd docker docker-common
rm -rf /etc/origin /var/lib/openshift /etc/etcd \
   /var/lib/etcd /etc/sysconfig/atomic-openshift* /etc/sysconfig/docker* \
   /root/.kube/config /etc/ansible/facts.d /usr/share/openshift

Configure HTTP Proxying

The installation procedure should have already configured the ingress node load balancer, which is also aliased as "api-lb.ocp36.local".

Configure the load balancer HAProxy on the "api-lb" node to log in/var/log/haproxy.log. The procedure is described here:

HAProxy Logging Configuration

Configure Application Proxying

In case the infrastructure node(s) the router pod(s) have been deployed on does not expose a public address, we need to stand up an application HTTP proxy, or reconfigure the existing HAProxy already deployed on api-lb.ose36.local to also proxy application traffic. The public application traffic is directed to the wildcard domain configured on the external DNS.

The current noper430 installation stands up two interfaces /dev/ens8 and /dev/ens9 configured with two distinct public IP addresses, and one of the network interfaces is used for API/master traffic, while the other is used for application traffic. This solution was chosen because no obvious solution was immediately available to configure HAProxy to route HTTP traffic based on its Host header in HAProxy tcp mode. It does not mean such a solution does not exist.

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    maxconn     20000
    log         127.0.0.1:514 local2
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          300s
    timeout server          300s
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 20000

listen stats :9000
    mode http
    stats enable
    stats uri /

frontend master_external_frontend
    bind 104.50.201.84:443
    default_backend master_backend
    mode tcp

frontend master_internal_frontend
    bind 192.168.122.22:443
    default_backend master_backend
    mode tcp

frontend app_frontend
    bind 104.50.201.85:443
    default_backend app_backend
    mode tcp

backend master_backend
    balance source
    mode tcp
    server      master0 192.168.122.24:443 check

backend app_backend
    balance source
    mode tcp
    server      infranode 192.168.122.25:443 check

Reboot the Whole Environment

./stop-ose36
./start-ose36

Base Installation Validation

OpenShift Installation Validation

Logging Installation

Logging Installation

Metrics Installation

Metrics Installation

Post-Install

Configure Project Node Selector

oc edit namespace openshift-infra
metadata:
  annotations:
    ...
    openshift.io/node-selector: env=infra

Configure the Number of Cores used by the Master and Node OpenShift Processes

configure the number of cores used by the master and node OpenShift processes

Test oadm On Master

oadm top pod
oadm top node

Test oc on an External Node

oc get status
oc get pods

Snapshot the Working Environment

./stop-ose36

for i in \
 /main-storage-pool/ocp36.ingress.qcow2 \
 /main-storage-pool/ocp36.support.qcow2 \
 /main-storage-pool/ocp36.master.qcow2 \
 /main-storage-pool/ocp36.infranode.qcow2 \
 /main-storage-pool/ocp36.node1.qcow2 \
 /main-storage-pool/ocp36.node2.qcow2; \
 do \
   echo "taking snapshot of $i"; \
   qemu-img snapshot -c ocp36_installed $i; \
 done