Kubernetes Concepts
External
Internal
Overview
Kubernetes is an container orchestration platform, offering the ability of orchestrate Docker containers across multiple hosts. It manages containers in a clustered environment. It orchestrates containers at scale, defines application topologies, handles parts of the container networking, manages container state and schedules containers across hosts.
Master
Node
A node is a Linux container host.
It is based on RHEL or Red Hat Atomic and provides a runtime environment where applications run inside containers, which are contained in pods assigned by the master. Nodes are orchestrated by masters.
Nodes can be organized into many different topologies.
A node daemon runs on node each node.
What is the difference between the kubelet and the node daemon?
kube proxy daemons.
Pod
Storage
Persistent Volume
Represented by a PersistentVolume object. It is associated with a project.
An administrator provision persistent volumes from sources such as:
- NFS
- GCE Persistent Disks
- ESB Volumes
- GlusterFS
- OpenStack Cinder
- Ceph RBD
- iSCSI
- Fiber Channel
Storage resources are requested by laying a claim to the resource (PersistentVolumeClaim). A persistent volume claim is a request for a resource with a specific attributes. When a request is made, a process matches it to an available volume and binds them together. The runtime finds the volume bound to the claim and mounts it into the pod.
Persistent volumes can be recycled after use. The reclamation policy is based on the "persistentVolumeReclaimPolicy" declared in the PersistentVolume object definition. The policy can be "Retain" or "Recycle".
etcd
A distributed key/value datastore for state within the environment.
Scheduler
Scheduling is essentially the master's main function: when a user decides to create a pod, the master determines where to do this - this is called scheduling. The scheduler is a component that runs on master and determines the best fit for running pods across the environment. The scheduler also spreads pod replicas across nodes, for application HA. The scheduler reads data from the pod definition and tries to find a nod that is a good fit based on configured policies. The scheduler does not modify the pod, it creates a binding that ties the pod to the selected node, via the master API.
The scheduler is completely independent and exists as a standalone, pluggable solution.
The scheduler is deployed as a container (referred to as an infrastructure container).
The functionality of the scheduler can be extended in two ways:
- Via enhancements, by adding predicates to the priority functions.
- Via replacement with a different implementation.
Default Scheduler Implementation
The default scheduler is a scheduling engine that selects the node to host the pod in three steps:
- Filter all available nodes by running through a list of filter functions called predicates, discarding the nodes that do not meet the criteria.
- Prioritize the remaining nodes by passing through a series of priority functions that assigns each node a score between 0 - 10. 10 signifies the best possible fit to run the pod. By default all priority function are considered equivalent, but they can be weighted differently via configuration.
- Sorts the node by score and selects the node with the highest score. If multiple nodes come with the same score, one is chosen at random.
Predicates
Static Predicates
- PodFitsPorts - a node is fit fi there are no port conflicts.
- PodFitsResources - a node is fit based on resource availability. Nodes declare resource capacities, pods specify what resources they require.
- NoDiskConflict - evaluates if a pod fits based on volumes requested and those already mounted.
- MatchNodeSelector - a node is fit based on the node selector query.
- HostName - a node is fit based on the presence of host parameter and string match with host name.
Configurable Predicates
- ServiceAffinity - filters out nodes that do not belong to the topological level defined by the provided labels.
- LabelsPresence - checks whether the node has certain labels defined, regardless of value.
Priority Functions
Existing Priority Functions
- LeastRequestedPriority - favors nodes with fewer requested resources, calculates percentage of memory and CPU requested by pods scheduled on node, and prioritizes nodes with highest available capacity.
- BalancedResourceAllocation - favors nodes with balanced resource usage rate, calculates difference between consumed CPU and memory as fraction of capacity and prioritizes nodes with the smallest difference. It should always be used with LeastRequestedPriority.
- ServiceSpreadingPriority - spreads pods by minimizing the number of pods that belong to the same service, onto the same node
- EqualPriority
Configurable Priority Functions
- ServiceAntiAffinity
- LabelsPreference
Scheduler Policy
The selection of the predicates and the priority functions defines the scheduler policy.
Scheduler Policy File
Namespace
A namespace provides scope for:
- named resources to avoid naming collisions
- delegating management authority to trusted users
- the ability to limit community resource consumption
Policies
Policies are rules that specify which users can and cannot specify actions on objects (pods, services, etc.).
Service
A service represents a group of pods, which may come and go, and its primary function is to provide the permanent IP, hostname and port for other applications to use. A service resource is an abstraction that defines a logical set of pods and a policy that is used to access the pods. The service layer is how applications communicate with one another.
The service serves as an internal load balancer: it identifies a set of replicated pods and then proxies the connections it receives to those pods (routers provide external load balancing).
The service is not a thing, but an entry in the configuration.
Backing pods can be added or removed to or from the service arbitrarily. This way, anything that depends on the service can refer to it as a consistent IP:port pair. The services uses a label selector to find all the running containers associated with it.
Service Definition File
API
Label
Labels are simple key/value pairs that can be used to group and select arbitrarily related objects. Most Kubernetes objects can include labels in their metadata.
Labels provide the default way of manage objects as groups, instead of having to handle each object individually.
Selector
A set of labels.
Replica
A replica is a set of pods sharing the same definition.
Replication Controller
A master component that insures a specified number of pod replicas defined in the environment state are running at all times. If pods exit or are deleted, the replication controller instantiates more pods up to desired number. If there are more pods running than desired, the replication controller deletes as many as necessary. It is NOT the replication controller's job to perform autoscaling based on load or traffic.
The definition of a replication controller includes the number of replicas to be maintained, the pod definition for creating the replicated pod, and a selector for identifying managed pods.