Kubernetes Storage Concepts

From NovaOrdis Knowledge Base
Revision as of 17:49, 12 December 2019 by Ovidiu (talk | contribs) (→‎local)
Jump to navigation Jump to search

Internal

Overview

Kubernetes has a mature and feature-rich subsystem called the persistent volume subsystem, which exposes external storage to applications.

Volumes

https://kubernetes.io/docs/concepts/storage/volumes

Regardless of where it comes from, external storage is exposed to pods in the form of volumes.

A Kubernetes volume has the same lifetime as the pod that encloses it. The volume outlives any containers that run within the pod, and data is preserved across container restarts. However, when a pod ceases to exist, the volume will too cease to exist. A pod can use multiple volumes, at the same time. Conceptually, a pod volume is just a directory, which is accessible to the containers in the pod. However, the actual backing medium of the directory, and its contents are determined by the particular volume type used. More details on how volumes and volume mounts are declared in the pod manifests are available in:

Pod Storage

Also see difference between a (pod) volume and a Persistent volume.

Volume Types

configMap

This type of volume is backed by a CofigMap API resource instance. For more details, see:

ConfigMap

secret

This type of volume is backed by a Secret API resource instance. secret volumes are backed by tmpfs (RAM-backed filesystem) so they are never written to non-volatile storage. For more details, see:

Secrets

downwardAPI

https://kubernetes.io/docs/concepts/storage/volumes/#downwardapi

emptyDir

https://kubernetes.io/docs/concepts/storage/volumes/#emptydir

An emptyDir volume is erased when the pod is removed.

hostPath

https://kubernetes.io/docs/concepts/storage/volumes/#hostpath

A hostPath volume mounts a file or a directory from the node's host file system into the pod.


Normally, this is not something that pods should do, as it couples a pod with a specific node. This type of volume might introduce non-determinism in the pod behavior because pods with identical configuration may behave differently on different nodes due to different files on the nodes. The recommended way to consume local storage is via local volumes.

The files or directories created on the underlying hosts are only writable by root. You either need to run your process as root in a privileged container or modify the file permissions on the host to be able to write to a hostPath volume.

apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  containers:
  - name: test
    ...
    volumeMounts:
    - mountPath: /test-pd
      name: test-volume
  volumes:
  - name: test-volume
    hostPath:
      # directory location on host
      path: /data
      # this field is optional
      type: Directory

path

Required parameter that specifies the path on the local host filesystem.

type

An empty string (default) is for backward compatibility, which means that no checks will be performed before mounting the hostPath volume.

Other supported values:

  • DirectoryOrCreate. If nothing exists at the given path, an empty directory will be created there as needed with permission set to 0755, having the same group and ownership with Kubelet.
  • Directory. A directory must exist at the given path
  • FileOrCreate. If nothing exists at the given path, an empty file will be created there as needed with permission set to 0644, having the same group and ownership with Kubelet.
  • File. A file must exist at the given path
  • Socket. A UNIX socket must exist at the given path
  • CharDevice. A character device must exist at the given path
  • BlockDevice. A block device must exist at the given path

local

https://kubernetes.io/docs/concepts/storage/volumes/#local

A local volume is storage physically attached to the node host. As such, a local volume on a certain node will be only available to pods scheduled on that node. This storage model makes sense for StatefulSets, but not for other pod deployment models: using local storage ties the application to specific nodes, making it harder to schedule. If that node or local volume encounters a failure and becomes inaccessible, then that pod also becomes inaccessible. In addition, many cloud providers do not provide extensive data durability guarantees for local storage, so all data could be lost in certain scenarios. Applications that are suitable for local storage should be tolerant of node failures, data unavailability, and data loss (e.g. Cassandra).

The local volume mechanism allows exposing a local disk, partition or directory. The storage can be exposed to the pod as a block storage (alpha feature at the time of the writing - this is useful to workloads that need to directly access block devices and manage their own data format) or as a filesystem.

Local volumes are available since v1.14.

local Volume Operations

nfs

https://kubernetes.io/docs/concepts/storage/volumes/#nfs

An nfs volume allows an existing NFS share to be mounted into pods. Unlike emptyDir, which is erased when a pod is removed, the contents of an nfs volume are preserved and the volume is merely unmounted. This makes possible to pre-populated nfs volumes with data, and hand off data to pods and between pods. NFS can be mounted by multiple writers simultaneously.

nfs Volume Example

persistentVolumeClaim

https://kubernetes.io/docs/concepts/storage/volumes/#persistentvolumeclaim

A persistentVolumeClaim volume is used to mount a persistent volume into the pod, by raising a "claim" to storage, in form of a persistent volume claim API object. This mode allows getting storage without knowing the details of a particular environment. This is how a pod requests a persistent volume:

apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  containers:
    - name: test
      ...
      volumeMounts:
      - mountPath: "/something"
        name: pvc-volume
  volumes:
    - name: pvc-volume
      persistentVolumeClaim:
        claimName: test-pvc

projected

https://kubernetes.io/docs/concepts/storage/volumes/#projected

TODO: A projected volume maps several existing volume sources into the same directory.

awsElasticBlockStore

https://kubernetes.io/docs/concepts/storage/volumes/#awselasticblockstore

An awsElasticBlockStore volume mounts an Amazon Elastic Block Store volume into the pod. The EBS volume is a raw block volume. When the pod is removed, the contents of the ESB volumes are preserved, and the ESB volume is merely unmounted. This means it can be pre-populated with data, which can be handed off to pods. To use awsElasticBlockStore volumes, the nodes on which pods are running must be AWS EC2 instances, and those instances need to be in the same region and availability-zone as the EBS volume. EBS only supports a single EC2 instance mounting a volume.

apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  containers:
    - name: test
      ...   
      volumeMounts:
        - mountPath: /test-ebs
           name: test-volume
  volumes:
    - name: test-volume
      # This AWS EBS volume must already exist
      awsElasticBlockStore:
        volumeID: <volume-id>
        fsType: ext4

glusterfs

A glusterfs volume allows a GlusterFS volume to be mounted into the pod.

https://kubernetes.io/docs/concepts/storage/volumes/#glusterfs

Also see:

GlusterFS Kubernetes

azureDisk

Storage Providers

Storage is made available to a Kubernetes cluster by storage providers. The Kubernetes persistent volume subsystem supports, among others:

Each storage provider has its own plugin that handles the details of exposing the storage to the Kubernetes cluster.

Storage Plugins

The terms "storage plugin" and "provisioner" can be used interchangeably. "Provisioner" is used especially when dynamic provisioning is involved. "Driver" is another equivalent term for storage plugin.

Old storage plugins used to be implemented as part of the main Kubernetes code tree (in-tree), which raised a series of problems, such as that all had to be open-source and their release cycle was tied to the Kubernetes release cycle. Newer plugins are based on the Container Storage Interface (CSI) and can be developed out-of-tree.

Plugin Types

kubernetes.io/no-provisioner

kubernetes.io/no-provisioner

kuberentes.io/aws-ebs

kuberentes.io/aws-ebs

kuberentes.io/gce-pd

Other Provisioners

Container Storage Interface (CSI)

Container Storage Interface (CSI) is an open standard that provides a clean interface for storage plugins and abstracts the internal Kubernetes storage details. CSI provides means so the external storage can be leveraged in a uniform way across multiple container orchestrators - not only Kubernetes.

API Resources

The persistent volume subsystem consists of the following three API resource types that allow applications to consume storage: persistent volumes, persistent volume claims and storage classes:

Persistent Volume (PV)

https://kubernetes.io/docs/concepts/storage/persistent-volumes/

The persistent volume is the API resource that maps onto external storage assets and makes them accessible to the Kubernetes cluster and to applications. Each persistent volume is an object in the Kubernetes cluster that maps back to a specific storage asset (LUN, share, blob, etc.). A single external storage asset can only be used by a single persistent volume.

A pod can use a persistent volume by indicating a persistent volume claim (see below) whose access mode, storage class name and capacity match that of the persistent volume. The pod cannot specify a persistent volume directly, the match is intermediated by the Kubernetes cluster.

From a declarative perspective, to get a persistent volume storage, the pod lists a persistentVolumeClaim volume among the required volumes in its manifest, as shown above.

Difference between a Pod Volume and a Persistent Volume

Persistent Volume Manifest

Persistent Volume Manifest

Access Mode

A persistent volume can be bound in only one mode - it is not possible for a persistent volume to have a persistent volume claim bound to it in ROM mode and another persistent volume claim bound in RWM mode.

ReadWriteOnce (RWO)

This mode defines a persistent volume that can only be bound in read/write mode by a single persistent volume claim. An attempt to bind it via multiple persistent volume claims will fail. In general, block storage normally only supports RWO.

ReadWriteMany (RWM)

This mode defines a persistent volume that can be bound in read/write mode by multiple persistent volume claims. In general, file storage and object storage support RWM.

ReadOnlyMany (ROM)

This mode defines a persistent volume that can be bound in read only mode by multiple persistent volume claims.

Reclaim Policy

The reclaim policy tells Kubernetes what to do with a persistent volume when its persistent volume claim has been released.

Delete

This policy deletes the persistent volume and the underlying associated external storage resource, on the external storage system. This is the default policy for volumes that are created dynamically via a storage class.

Retain

This policy keeps the persistent volume in the cluster, as well as the underlying associated external storage resource, on the external storage system. However, it will prevent another persistent volume claim from using the persistent volume.

To reuse the space associated with a retained persistent volume, the persistent volume should be manually deleted, the underlying external storage reformatted and then the persistent volume should be recreated.

Storage Class Name

Capacity

The capacity, expressed in the persistent volume manifest, can be less than the actual underlying physical storage, but cannot be more.

Persistent Volume Claim (PVC)

https://kubernetes.io/docs/concepts/storage/persistent-volumes

Pods do not act directly on persistent volumes, they need something called persistent volume claim, which is an API resource object that is bound to the persistent volume the pod wants to use. A persistent volume claim is similar to a ticket that authorizes a pod to use a certain persistent volume. Once an application has a persistent volume claim, it can mount the respective volume into its pod.

From a declarative perspective, to get a persistent volume storage, the pod lists a persistentVolumeClaim volume among the required volumes in its manifest, as shown above.

Persistent Volume Claims and Storage Class

A claim may request a particular storage class by specifying its name, using the attribute storageClassName. If the claim expressly requests a class, only the persistent volumes of that class can be bound to the claim. Claims do not necessarily have to request a class. A claim with its storageClassName set to "" is always interpreted to be requesting a persistent volume with no class, so it can only be bound to persistent volumes with no class (no annotation or one set equal to ""). A claim with no storageClassName is not quite the same and is treated differently by the cluster, depending on whether the DefaultStorageClass admission controller is turned on. The DefaultStorageClass admission controller observes creation of PersistentVolumeClaim objects that do not request any specific storage class and automatically adds a default storage class to them. This way, users that do not request any special storage class do not need to care about them at all and they will get the default one. When more than one storage class is marked as default, it rejects any creation of persistent volume claim with an error and an administrator must revisit their StorageClass objects and mark only one as default. This admission controller ignores any persistent volume claim updates; it acts only on creation. The admission controller does not do anything when no default storage class is configured: the claims with no explicit storage class will only be bound to matching persistent volume with no storage class, if any. If the matching persistent volumes belong to an explicit storage class, they won't bind: this is because the claim and the persistent volume's storage classed must match to bind.

Persistent Volume Claim Manifest

Persistent Volume Claim Manifest

Storage Class (SC)

https://kubernetes.io/docs/concepts/storage/storage-classes/

A storage class is an API resource that allows the definition of a class or tier of storage, from which an application can then dynamically request storage. Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies defined by the cluster administrators. Obviously, the type of storage classes that can be defined depends on the types of external storage the Kubernetes cluster has access to. A pod can use a dynamically-provisioned persistent volume from a specific storage class by using a persistent volume claim that references that storage class by name. The persistent volume that will provide the storage does not need to be created or declared: the storage class creates the persistent volume dynamically. Once deployed, the storage class watches the API server for new PVC objects that reference its name. When a matching persistent volume claim appears, the storage class dynamically creates the required persistent volume.

The storage class resources are defined in the storage.k8s.io/v1 API group. Each storage class object relates to a single provisioner. StorageClass objects are immutable, they cannot be modified once deployed.

Storage Class Manifest

Storage Class Manifest

Default Storage Class

For the time being, the default storage class is set via annotations. If the cluster has a default storage class, a pod can be deployed using just a persistent volume claim - the storage class does not need to be manually created.

Dynamic Volume Provisioning

https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/

As per 2019, dynamic provisioning of local volumes is under design.

Persistent Volume Controller

The persistent volume controller matches persistent volume claims with suitable persistent volumes.

Storage Operations

Storage Operations