Kubernetes Storage Concepts: Difference between revisions
(190 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=External= | |||
* https://kubernetes.io/docs/concepts/storage/ | |||
=Internal= | =Internal= | ||
* [[Kubernetes_Concepts#Subjects|Kubernetes Concepts]] | * [[Kubernetes_Concepts#Subjects|Kubernetes Concepts]] | ||
* [[Kubernetes_Storage_Operations|Storage Operations]] | |||
=Overview= | =Overview= | ||
Kubernetes has a mature and feature-rich subsystem called the [[#API_Resources|persistent volume subsystem]], which exposes external storage to applications. | Kubernetes has a mature and feature-rich subsystem called the [[#API_Resources|persistent volume subsystem]], which exposes external storage to applications. | ||
=<span id='Volume'></span>Volumes= | =<span id='Volume'></span><span id='Volumes'></span>Pod Volumes= | ||
{{External|https://kubernetes.io/docs/concepts/storage/volumes}} | {{External|https://kubernetes.io/docs/concepts/storage/volumes}} | ||
Regardless of where it comes from, external storage is [[Kubernetes_Pod_and_Container_Concepts#Pod_Storage|exposed to pods]] in the form of '''volumes''' (or pod volume, as opposite to [[#Persistent Volume|persistent volumes]]). | |||
A Kubernetes volume has the same lifetime as the pod that encloses it. | A Kubernetes pod volume has the same lifetime as the pod that encloses it. For more details on the relationship between a pod and its volumes, see: {{Internal|Kubernetes_Pod_and_Container_Concepts#Pod_Lifecycle|Pod and Container Concepts | Pod Lifecycle}} | ||
The volume outlives any containers that run within the pod, and data is preserved across container restarts. However, when a pod ceases to exist, the volume will too cease to exist. A pod can use multiple volumes, at the same time. Conceptually, a pod volume is just a directory, which is accessible to the containers in the pod. However, the actual backing medium of the directory, and its contents are determined by the particular [[#Volume_Types|volume type]] used. More details on how volumes and volume mounts are declared in the pod manifests are available in: {{Internal|Kubernetes_Pod_and_Container_Concepts#Pod_Storage|Pod Storage}} | |||
Also see [[#Difference_between_a_Pod_Volume_and_a_Persistent_Volume|difference between a (pod) volume and a Persistent volume]]. | Also see [[#Difference_between_a_Pod_Volume_and_a_Persistent_Volume|difference between a (pod) volume and a Persistent volume]]. | ||
==<span id='Volume_Type'></span>Volume Types== | |||
===configMap=== | |||
This type of volume is backed by a CofigMap API resource instance. For more details, see: | |||
< | {{Internal|Kubernetes_Cluster_Configuration_Concepts#ConfigMap_Overview|ConfigMap}} | ||
===secret=== | |||
[[ | {{External|https://kubernetes.io/docs/concepts/storage/volumes/#secret}} | ||
== | {{External|https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets}} | ||
<font color=darkgray> | This type of volume is backed by a Secret API resource instance. secret volumes are backed by tmpfs (RAM-backed filesystem) so they are never written to non-volatile storage. For more details, see: | ||
{{Internal|Kubernetes_Cluster_Configuration_Concepts#Secrets|Secrets}} | |||
A typical secret volume definition looks as follows: | |||
</ | <syntaxhighlight lang='yaml'> | ||
kind: Pod | |||
spec: | |||
[...] | |||
volumes: | |||
- name: my-secret-volume | |||
secret: | |||
defaultMode: 256 | |||
secretName: my-secret | |||
</syntaxhighlight> | |||
When projected into the pod, the secret files belong to root:root, even if the pod's security context specify a [[Kubernetes_Pod_and_Container_Security#runAsUser|runAsUser]] or [[Kubernetes_Pod_and_Container_Security#runAsGroup|runAsGroup]]. However, if [[Kubernetes_Pod_and_Container_Security#fsGroup|fsGroup]] is defined in the pod security context, the secret files belong to fsGroup and the file permissions are automatically adjusted so they are readable by the group. | |||
====<tt>defaultMode</tt>==== | |||
Specifies the permissions for the file created by the secret volume mount. JSON does not support octal notation, so the "0400" octal notation must be converted to decimal (256). If YAML is used, octal notation can be used. Note that if [[Kubernetes_Pod_and_Container_Security#fsGroup|fsGroup]] is declared in the pod security context, the file permissions are automatically adjusted so they are readable by the group, even if the defaultMode is 0400. | |||
===downwardAPI=== | |||
{{Internal|Kubernetes Downward API Concepts|Kubernetes Downward API Concepts}} | |||
===emptyDir=== | |||
{{External|https://kubernetes.io/docs/concepts/storage/volumes/#emptydir}} | |||
{{External|https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#emptydirvolumesource-v1-core}} | |||
An emptyDir volume is erased when the pod is removed. | |||
===hostPath=== | |||
{{External|https://kubernetes.io/docs/concepts/storage/volumes/#hostpath}} | |||
A hostPath volume mounts a file or a directory from the node's host file system into the pod. | |||
{{Warn|Normally, this is not something that pods should do, as it couples a pod with a specific node. This type of volume might introduce non-determinism in the pod behavior because pods with identical configuration may behave differently on different nodes due to different files on the nodes. The recommended way to consume local storage is via [[#local|local volumes]].}} | |||
The files or directories created on the underlying hosts are only writable by root. You either need to run your process as root in a privileged container or modify the file permissions on the host to be able to write to a hostPath volume. | |||
<syntaxhighlight lang='yaml'> | |||
apiVersion: v1 | |||
kind: Pod | |||
metadata: | |||
name: test | |||
spec: | |||
containers: | |||
- name: test | |||
... | |||
volumeMounts: | |||
- mountPath: /test-pd | |||
name: test-volume | |||
volumes: | |||
- name: test-volume | |||
hostPath: | |||
# directory location on host | |||
path: /data | |||
# this field is optional | |||
type: Directory | |||
</syntaxhighlight> | |||
====path==== | |||
Required parameter that specifies the path on the local host filesystem. | |||
====type==== | |||
An empty string (default) is for backward compatibility, which means that no checks will be performed before mounting the hostPath volume. | |||
Other supported values: | |||
* DirectoryOrCreate. If nothing exists at the given path, an empty directory will be created there as needed with permission set to 0755, having the same group and ownership with Kubelet. | |||
* Directory. A directory must exist at the given path. | |||
{{Note|If we rely on the existence of the directory on the host, and we don't want to create it upon projection, then it is best to use 'type: Directory'. If the directory does not exist on the host path, the pod creation will fail with "MountVolume.SetUp failed for volume volume-1: hostPath type check failed: /tmp/x is not a directory", as a fail-early test. Also see [[#hostPath_on_single-node_Kubernetes_Clusters_.28minikube.2C_Docker_Desktop_Kubernetes.29|hostPath on single-node Kubernetes Clusters (minikube,_Docker_Desktop_Kubernetes)]] below.}} | |||
* FileOrCreate. If nothing exists at the given path, an empty file will be created there as needed with permission set to 0644, having the same group and ownership with Kubelet. | |||
* File. A file must exist at the given path | |||
* Socket. A UNIX socket must exist at the given path | |||
* CharDevice. A character device must exist at the given path | |||
* BlockDevice. A block device must exist at the given path | |||
====hostPath on single-node Kubernetes Clusters (minikube, Docker Desktop Kubernetes)==== | |||
Single-node Kubernetes clusters running in VMs, such as Docker Desktop Kubernetes or minikube allow access to their host paths only if those paths are "shared" via the cluster's configuration. For Docker Desktop Kubernetes, host directories can be "shared" via Preferences → Resources → File Sharing (see [[Docker_Desktop#Docker_Desktop_File_Sharing|Docker Desktop File Sharing]]). For minikube running with a VM driver, directories need to be individually mounted, while there are several mounted by default (see [[Minikube_Operations#Mount|minikube mount]]). Minikube in [[Minikube_Concepts#None|bare-metal mode]] offers direct access to host directories. | |||
If the path being attempted to be mounted as "hostPath" is not among the shared directories, it is interpreted as being relative to the embedded VM that runs the single-node Kubernetes cluster, not to the "outer" host and it is usually created inside the VM. Since the directory is created, the directory belongs to root:root, and that explains the impossibility to write into it as a non-root user. To prevent this behavior and fail early, use "type: Directory" for hostPath. | |||
===local=== | |||
{{External|https://kubernetes.io/docs/concepts/storage/volumes/#local}} | |||
A local volume is storage physically attached to the node host. As such, a local volume on a certain node will be only available to pods scheduled on that node. This storage model makes sense for [[Kubernetes StatefulSet|StatefulSets]], but not for other pod deployment models: using local storage ties the application to specific nodes, making it harder to schedule. If that node or local volume encounters a failure and becomes inaccessible, then that pod also becomes inaccessible. In addition, many cloud providers do not provide extensive data durability guarantees for local storage, so all data could be lost in certain scenarios. Applications that are suitable for local storage should be tolerant of node failures, data unavailability, and data loss (e.g. [[Cassandra]]). | |||
The local volume mechanism allows exposing a local disk, partition or [[Kubernetes_Storage_Operations#Expose_a_Local_Directory|directory]]. The storage can be exposed to the pod as a [[Storage_Concepts#Block_Storage|block storage]] (alpha feature at the time of the writing - this is useful to workloads that need to directly access block devices and manage their own data format) or as a [[Storage_Concepts#Block_Storage|filesystem]]. | |||
Local volumes are available since v1.14. | |||
Before any persistent volume claims for local persistent volumes are created, a dedicated storage class with the [[Kubernetes_Storage_Class_Manifest#volumeBindingMode|volumeBindingMode]] set to '[[Kubernetes_Storage_Class_Manifest#WaitForFirstConsumer|WaitForFirstConsumer]]' must be created. An example is available [[Kubernetes_Storage_Operations#Storage_Class|here]]. | |||
====local Volume Operations==== | |||
* [[Kubernetes_Storage_Operations#Create_a_Local_Volume|Create a local volume]] | |||
===nfs=== | |||
{{External|[https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#nfsvolumesource-v1-core nfs Volume Manifest]}} | |||
An nfs volume allows an existing [[Linux_NFS_Concepts#Share|NFS share]] to be mounted into pods. Unlike [[#emptyDIr|emptyDir]], which is erased when a pod is removed, the contents of an nfs volume are preserved and the volume is merely unmounted. This makes possible to pre-populated nfs volumes with data, and hand off data to pods and between pods. NFS can be mounted by multiple writers simultaneously. The NFS server must be running and the share exported before it can be used as an nfs volume. This is how a pod mounts an NFS volume: | |||
<syntaxhighlight lang='yaml'> | |||
apiVersion: v1 | |||
kind: Pod | |||
metadata: | |||
name: test | |||
spec: | |||
containers: | |||
- name: test | |||
... | |||
volumeMounts: | |||
- mountPath: "/something" | |||
name: nfs-volume | |||
volumes: | |||
- name: nfs-volume | |||
nfs: | |||
# the URL of the NFS server | |||
server: 10.10.2.249 | |||
path: /opt/nfs0 | |||
</syntaxhighlight> | |||
{{Warn|'''Important''' The Kubernetes node host on which pods that attempt to mount nfs volumes are schedules must have NFS client dependencies, as described in [[Linux_NFS_Installation#Client_Installation|NFS Client Installation]], otherwise the mount will fail with messages similar to "mount: wrong fs type, bad option, bad superblock on 1..."}} | |||
NFS volume example: | |||
{{Internal|Kubernetes_Storage_Operations#NFS_volume_Example|nfs Volume Example}} | |||
Also see: {{Internal|Quay.io/kubernetes_incubator/nfs-provisioner|quay.io/kubernetes_incubator/nfs-provisioner}} | |||
===persistentVolumeClaim=== | |||
{{External|https://kubernetes.io/docs/concepts/storage/volumes/#persistentvolumeclaim}} | |||
A persistentVolumeClaim volume is used to mount a [[#Persistent_Volume_.28PV.29|persistent volume]] into the pod, by raising a "claim" to storage, in form of a [[#Persistent_Volume_Claim_.28PVC.29|persistent volume claim]] API object. This mode allows getting storage without knowing the details of a particular environment. This is how a pod requests a persistent volume: | |||
<syntaxhighlight lang='yaml'> | |||
apiVersion: v1 | |||
kind: Pod | |||
metadata: | |||
name: test | |||
spec: | |||
containers: | |||
- name: test | |||
... | |||
volumeMounts: | |||
- mountPath: "/something" | |||
name: pvc-volume | |||
volumes: | |||
- name: pvc-volume | |||
persistentVolumeClaim: | |||
claimName: test-pvc | |||
</syntaxhighlight> | |||
====persistentVolumeClaim Idiosyncrasies==== | |||
If the same claim name is reused for a volume with a different name, the pod will not start with: | |||
<syntaxhighlight lang='text'> | |||
Unable to attach or mount volumes: unmounted volumes=[persistent-storage], unattached volumes=[default-token-j6wgp persistent-storage persistent-storage-2]: timed out waiting for the condition | |||
</syntaxhighlight> | |||
====persistentVolumeClaim and hostPath==== | |||
A hostPath (local directory) can be exposed to a pod as a [[#Persistent_Volume_.28PV.29|persistent volume]], attached to the pod via a [[#Persistent_Volume_Claim_.28PVC.29|persistent volume claim]]: {{Internal|Kubernetes persistentVolume and hostPath|Exposing a hostPath as persistent volume}} | |||
===projected=== | |||
{{External|https://kubernetes.io/docs/concepts/storage/volumes/#projected}} | |||
<font color=darkgray>TODO:</font> A projected volume maps several existing volume sources into the same directory. | |||
===awsElasticBlockStore=== | |||
{{External|https://kubernetes.io/docs/concepts/storage/volumes/#awselasticblockstore}} | |||
An awsElasticBlockStore volume mounts an [[Amazon_Elastic_Block_Store_Concepts#Volume|Amazon Elastic Block Store volume]] into the pod. The EBS volume is a raw block volume. When the pod is removed, the contents of the ESB volumes are preserved, and the ESB volume is merely unmounted. This means it can be pre-populated with data, which can be handed off to pods. To use awsElasticBlockStore volumes, the nodes on which pods are running must be AWS EC2 instances, and those instances need to be in the same region and availability-zone as the EBS volume. EBS only supports a single EC2 instance mounting a volume. | |||
<syntaxhighlight lang='yaml'> | |||
apiVersion: v1 | |||
kind: Pod | |||
metadata: | |||
name: test | |||
spec: | |||
containers: | |||
- name: test | |||
... | |||
volumeMounts: | |||
- mountPath: /test-ebs | |||
name: test-volume | |||
volumes: | |||
- name: test-volume | |||
# This AWS EBS volume must already exist | |||
awsElasticBlockStore: | |||
volumeID: <volume-id> | |||
fsType: ext4 | |||
</syntaxhighlight> | |||
===glusterfs=== | |||
A glusterfs volume allows a [[GlusterFS_Kubernetes#glusterfs_Volumes|GlusterFS]] volume to be mounted into the pod. | |||
{{External|https://kubernetes.io/docs/concepts/storage/volumes/#glusterfs}} | |||
Also see: {{Internal|GlusterFS Kubernetes|GlusterFS Kubernetes}} | |||
===azureDisk=== | |||
=<span id='Mounting_a_Volume_in_Pod'></span>Mounting Volumes in Pods= | |||
{{Internal|Kubernetes Mounting Volumes in Pods|Mounting Volumes in Pods}} | |||
=Storage Providers= | =Storage Providers= | ||
Line 38: | Line 229: | ||
* [[Storage_Concepts#Object_Storage|object storage blobs]] | * [[Storage_Concepts#Object_Storage|object storage blobs]] | ||
* Amazon [[Amazon Elastic Block Store|Elastic Block Store]] [[Amazon_Elastic_Block_Store_Concepts#Block_Device|block devices]] | * Amazon [[Amazon Elastic Block Store|Elastic Block Store]] [[Amazon_Elastic_Block_Store_Concepts#Block_Device|block devices]] | ||
* Azure File resources | * Azure File resources, AzureDisk. See [[#Azure_Kubernetes_Storage|Azure Kubernetes Storage]] below. | ||
* GCE Persistent Disks | * GCE Persistent Disks | ||
* [[GlusterFS Kubernetes|GlusterFS volumes]] | * [[GlusterFS Kubernetes|GlusterFS volumes]] | ||
Each storage provider has its own [[#Storage_Plugin|plugin]] that handles the details of exposing the storage to the Kubernetes cluster. | Each storage provider has its own [[#Storage_Plugin|plugin]] that handles the details of exposing the storage to the Kubernetes cluster. | ||
==Azure Kubernetes Storage== | |||
{{Internal|Azure Kubernetes Storage|Azure Kubernetes Storage}} | |||
=<span id='Storage_Plugin'></span>Storage Plugins= | =<span id='Storage_Plugin'></span>Storage Plugins= | ||
Line 48: | Line 242: | ||
The terms "storage plugin" and "provisioner" can be used interchangeably. "Provisioner" is used especially when [[#Dynamic_Volume_Provisioning|dynamic provisioning]] is involved. "Driver" is another equivalent term for storage plugin. | The terms "storage plugin" and "provisioner" can be used interchangeably. "Provisioner" is used especially when [[#Dynamic_Volume_Provisioning|dynamic provisioning]] is involved. "Driver" is another equivalent term for storage plugin. | ||
Old storage plugins used to be implemented as part of the main Kubernetes code tree (in-tree), which raised a series of problems, such as that all had to be open-source and their release cycle was tied to the Kubernetes release cycle. Newer plugins are based on the [[#CSI|Container Storage Interface (CSI)]] and can be developed out-of-tree. | Old storage plugins used to be implemented as part of the main Kubernetes code tree (<span id='In-Tree_Storage_Plugins'></span>in-tree storage plugins), which raised a series of problems, such as that all had to be open-source and their release cycle was tied to the Kubernetes release cycle. Newer plugins are based on the [[#CSI|Container Storage Interface (CSI)]] and can be developed out-of-tree. | ||
==Plugin Types== | ==Plugin Types== | ||
===kubernetes.io/no-provisioner=== | |||
{{Internal|kubernetes.io/no-provisioner|kubernetes.io/no-provisioner}} | |||
===kuberentes.io/aws-ebs=== | ===kuberentes.io/aws-ebs=== | ||
{{Internal|kuberentes.io/aws-ebs|kuberentes.io/aws-ebs}} | |||
===kuberentes.io/gce-pd=== | ===kuberentes.io/gce-pd=== | ||
===kubernetes.io/azure-file=== | |||
{{Internal|Azure_Kubernetes_Storage#kubernetes.io.2Fazure-file_Provisioner|kubernetes.io/azure-file}} | |||
===Other Provisioners=== | |||
* <span id='quay.io.2Fkubernetes_incubator.2Fnfs-provisioner'></span>[[quay.io/kubernetes_incubator/nfs-provisioner|quay.io/kubernetes_incubator/nfs-provisioner]] | |||
* https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner - a local volume static provisioner that manages the persistent volume lifecycle for pre-allocated disks by detecting and creating PVs for each local disk on the host, and cleaning up the disks when released. It does not support dynamic provisioning. | |||
=<span id='Container_Storage_Interface'></span><span id='CSI'></span>Container Storage Interface (CSI)= | =<span id='Container_Storage_Interface'></span><span id='CSI'></span>Container Storage Interface (CSI)= | ||
Container Storage Interface (CSI) is an open standard that provides a clean interface for [[#Storage_Plugin|storage plugins]] and abstracts the internal Kubernetes storage details. CSI provides means so the external storage can be leveraged in a uniform way across multiple container orchestrators - not only Kubernetes. | Container Storage Interface (CSI) is an open standard that provides a clean interface for [[#Storage_Plugin|storage plugins]] and abstracts the internal Kubernetes storage details. CSI provides means so the external storage can be leveraged in a uniform way across multiple container orchestrators - not only Kubernetes. Both [[Storage_Concepts#Block_Storage|block]] and [[Storage_Concepts#Filesystem_Storage|filesystem]] storage can be exposed via CSI. | ||
==CSIDriver== | |||
The Kubernetes resources supporting the CSIDriver. | |||
<syntaxhighlight lang='bash'> | |||
kubectl get csidriver | |||
</syntaxhighlight> | |||
==Amazon EFS CSI== | |||
{{Internal|Amazon EFS CSI|Amazon EFS CSI}} | |||
==Azure CSI== | |||
{{Internal|Azure Kubernetes Storage#CSI|Azure Kubernetes Storage | CSI}} | |||
==Persistent Volume CSI Configuration== | |||
See [[#CSI_Configuration|CSI]] below. | |||
=<span id='Persistent_Volume_Subsystem '></span>API Resources= | =<span id='Persistent_Volume_Subsystem '></span>API Resources= | ||
The persistent volume subsystem consists of the following three API resource types that allow applications to consume storage: [[#Persistent_Volume|persistent volumes]], [[#Persistent_Volume_Claim|persistent volume claims]] and [[#Storage_Class|storage classes]]: | The '''persistent volume subsystem''' consists of the following three API resource types that allow applications to consume storage: [[#Persistent_Volume|persistent volumes]], [[#Persistent_Volume_Claim|persistent volume claims]] and [[#Storage_Class|storage classes]]: | ||
==<span id='StorageClass'></span><span id='SC'></span><span id='Storage_Class'></span>Storage Class (SC)== | |||
{{External|https://kubernetes.io/docs/concepts/storage/storage-classes/}} | |||
A storage class is an API resource that allows the definition of a class or tier of storage, from which an application can then dynamically request storage. Storage classes are not namespaced. | |||
For an overview of how storage classes, volumes and volume claims work together, see [[#Volume_and_Claim_Lifecycle_and_Binding|Volume and Claim Lifecycle and Binding]] below. | |||
Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies defined by the cluster administrators. Obviously, the type of storage classes that can be defined depends on the types of external storage the Kubernetes cluster has access to. A pod can use a dynamically-provisioned persistent volume from a specific storage class by using a persistent volume claim that references that storage class by name. The [[#Persistent_Volume|persistent volume]] that will provide the storage does not need to be created or declared: the storage class creates the persistent volume dynamically. Once deployed, the storage class watches the API server for new PVC objects that reference its name. When a matching persistent volume claim appears, the storage class dynamically creates the required persistent volume. | |||
The storage class resources are defined in the [[Kubernetes API Resources Concepts#StorageClass|storage.k8s.io/v1]] API group. Each storage class object relates to a single provisioner. StorageClass objects are immutable, they cannot be modified once deployed. | |||
===Storage Class Manifest=== | |||
{{Internal|Kubernetes Storage Class Manifest|Storage Class Manifest}} | |||
===Default Storage Class=== | |||
For the time being, the default storage class is set via [[Kubernetes Storage Class Manifest#default_storageclass_manifest|annotations]]. | |||
If the cluster has a default storage class, a pod can be deployed using just a persistent volume claim - the storage class does not need to be manually created. | |||
===Examples=== | |||
* [[Amazon_EFS_CSI_Operations#Deploy_the_EFS_Storage_Class|Amazon EFS CSI Storage Class]] | |||
==<span id='PersistentVolume'></span><span id='PV'></span><span id='Persistent_Volume'></span>Persistent Volume (PV)== | ==<span id='PersistentVolume'></span><span id='PV'></span><span id='Persistent_Volume'></span>Persistent Volume (PV)== | ||
{{External|https://kubernetes.io/docs/concepts/storage/persistent-volumes/}} | |||
The persistent volume is the API resource that maps onto external storage assets and makes them accessible to the Kubernetes cluster and to applications. Each persistent volume is an object in the Kubernetes cluster that maps back to a specific storage asset (LUN, share, blob, etc.). A single external storage asset can only be used by a single persistent volume. | The persistent volume is the API resource that maps onto external storage assets and makes them accessible to the Kubernetes cluster and to applications. Each persistent volume is an object in the Kubernetes cluster that maps back to a specific storage asset (LUN, share, blob, etc.). A single external storage asset can only be used by a single persistent volume. | ||
A pod can use a persistent volume by indicating a [[#Persistent_Volume_Claim|persistent volume claim]] (see below) whose [[#Access_Mode|access mode]], [[#Storage_Class_Name|storage class name]] and [[#Capacity|capacity]] match that of the persistent volume. The pod cannot specify a persistent volume directly, the match is intermediated by the Kubernetes cluster. | The persistent volume, lasts for the cluster lifetime, unlike a [[#Volume|pod volume]], which lasts for the pod lifetime. | ||
A pod can use a persistent volume by indicating a [[#Persistent_Volume_Claim|persistent volume claim]] (see below) whose [[#Access_Mode|access mode]], [[#Storage_Class_Name|storage class name]] and [[#Capacity|capacity]] match that of the persistent volume. The pod cannot specify a persistent volume directly, the match is intermediated by the Kubernetes cluster. For an overview of how storage classes, volumes and volume claims work together, see [[#Volume_and_Claim_Lifecycle_and_Binding|Volume and Claim Lifecycle and Binding]] below. | |||
From a declarative perspective, to get a persistent volume storage, the pod lists a [[#persistentVolumeClaim|persistentVolumeClaim volume]] among the required volumes in its manifest, as shown [[#persistentVolumeClaim|above]]. | |||
===Difference between a Pod Volume and a Persistent Volume=== | ===Difference between a Pod Volume and a Persistent Volume=== | ||
Line 73: | Line 317: | ||
{{Internal|Kubernetes Persistent Volume Manifest|Persistent Volume Manifest}} | {{Internal|Kubernetes Persistent Volume Manifest|Persistent Volume Manifest}} | ||
===Access Mode=== | ===Access Mode=== | ||
The binding between a Persistent Volume and its Persistent Volume Claims can be made in one mode only. It is not possible for a persistent volume to have one Persistent Volume Claim bound to a Persistent Volume in ReadOnlyMany mode and another Persistent Volume Claim bound to the same volume in ReadWriteMany mode. | |||
====ReadWriteOnce (RWO)==== | ====ReadWriteOnce (RWO)==== | ||
This mode defines a Persistent Volume that can only be bound in read/write mode by a single Persistent Volume Claim. An attempt to bind it via multiple Persistent Volume Claims will fail. In general, [[Storage_Concepts#Block_Storage|block storage]] normally only supports RWO. | |||
This mode defines a | |||
====ReadWriteMany (RWM)==== | ====ReadWriteMany (RWM)==== | ||
This mode defines a | This mode defines a Persistent Volume that can be bound in read/write mode by multiple Persistent Volume Claims. In general, [[Storage_Concepts#File_Storage|file storage]] and [[Storage_Concepts#Object_Storage|object storage]] support RWM. | ||
====ReadOnlyMany (ROM)==== | ====ReadOnlyMany (ROM)==== | ||
This mode defines a | This mode defines a Persistent Volume that can be bound in read only mode by multiple Persistent Volume Claims. | ||
===Reclaim Policy=== | ===Reclaim Policy=== | ||
The reclaim policy tells Kubernetes what to do with a persistent volume when its persistent volume claim has been released. | The reclaim policy tells Kubernetes what to do with a persistent volume when its persistent volume claim has been released. | ||
Line 88: | Line 332: | ||
This policy deletes the persistent volume and the underlying associated external storage resource, on the external storage system. This is the default policy for volumes that are created dynamically via a [[#Storage_Class|storage class]]. | This policy deletes the persistent volume and the underlying associated external storage resource, on the external storage system. This is the default policy for volumes that are created dynamically via a [[#Storage_Class|storage class]]. | ||
====Retain==== | ====Retain==== | ||
This policy keeps the persistent volume in the cluster, as well as the underlying associated external storage resource, on the external storage system. However, it will prevent another persistent volume claim from using the persistent volume. | This policy keeps the persistent volume in the cluster, as well as the underlying associated external storage resource, on the external storage system. However, it will prevent another persistent volume claim from using the persistent volume. To reuse the space associated with a retained persistent volume, the persistent volume should be manually deleted, the underlying external storage reformatted and then the persistent volume should be recreated. | ||
Local persistent volumes can only support a "Retain" reclaim policy. The administrator must manually clean up and set up the local volume again for reuse. | |||
===Storage Class Name=== | ===Storage Class Name=== | ||
Line 96: | Line 340: | ||
The capacity, expressed in the persistent volume manifest, can be less than the actual underlying physical storage, but cannot be more. | The capacity, expressed in the persistent volume manifest, can be less than the actual underlying physical storage, but cannot be more. | ||
===Node Affinity=== | |||
The persistent volume scheduler uses the node affinity configuration of a local persistent volume to understand what node host the storage for the volume is available on. | |||
===<span id='CSI_Configuration'></span>CSI=== | |||
====driver==== | |||
====volumeHandle==== | |||
==<span id='PersistentVolumeClaim'></span><span id='PVC'></span><span id='Persistent_Volume_Claim'></span>Persistent Volume Claim (PVC)== | ==<span id='PersistentVolumeClaim'></span><span id='PVC'></span><span id='Persistent_Volume_Claim'></span>Persistent Volume Claim (PVC)== | ||
{{External|https://kubernetes.io/docs/concepts/storage/persistent-volumes}} | {{External|https://kubernetes.io/docs/concepts/storage/persistent-volumes}} | ||
Pods do not act directly on [[#Persistent_Volume|persistent volumes]], they need something called | Pods do not act directly on [[#Persistent_Volume|persistent volumes]], they need something called Persistent Volume Claims, which is an API resource object that is bound to the Persistent Volume the pod wants to use. A Persistent Volume Claim is similar to a ticket that authorizes a pod to use a certain Persistent Volume. Once an application has a Persistent Volume Claim, it can mount the respective volume into its pod. | ||
Persistent Volume Claims are namespaced, so their "effective" name is <namespace>/<claim-name>. Two different Persistent Volume Claims with the same name, but declared in different namespaces are different, so if one is bound to a Persistent Volume, the other cannot be bound to the same volume. | |||
A Persistent Volume Claim can be bound to one and only one Persistent Volume. However, multiple pods can use the same Persistent Volume Claim, accessing, and sharing the same Persistent Volume, if the persistent volume storage allows sharing. For an in-depth discussion on how storage classes, volumes and volume claims work together, see [[#Volume_and_Claim_Lifecycle_and_Binding|Volume and Claim Lifecycle and Binding]] below. | |||
From a declarative perspective, to get a persistent volume storage, the pod lists a [[#persistentVolumeClaim|persistentVolumeClaim volume]] among the required volumes in its manifest, as shown [[#persistentVolumeClaim|above]]. | |||
===Persistent Volume Claims and Storage Class=== | |||
A claim may request a particular [[#Storage_Class|storage class]] by specifying its name, using the attribute [[Kubernetes_Persistent_Volume_Claim_Manifest#storageClassName|storageClassName]]. If the claim expressly requests a class, only the persistent volumes of that class can be bound to the claim. Claims do not necessarily have to request a class. A claim with its [[Kubernetes_Persistent_Volume_Claim_Manifest#storageClassName|storageClassName]] set to "" is always interpreted to be requesting a persistent volume with no class, so it can only be bound to persistent volumes with no class (no annotation or one set equal to ""). A claim with no [[Kubernetes_Persistent_Volume_Claim_Manifest#storageClassName|storageClassName]] is not quite the same and is treated differently by the cluster, depending on whether the [[Kubernetes_Admission_Controller_Concepts#DefaultStorageClass|DefaultStorageClass admission controller]] is turned on. The DefaultStorageClass admission controller observes creation of PersistentVolumeClaim objects that do not request any specific storage class and automatically adds a [[Kubernetes_Storage_Concepts#Default_Storage_Class|default storage class]] to them. This way, users that do not request any special storage class do not need to care about them at all and they will get the default one. When more than one storage class is marked as default, it rejects any creation of persistent volume claim with an error and an administrator must revisit their StorageClass objects and mark only one as default. This admission controller ignores any persistent volume claim updates; it acts only on creation. The admission controller does not do anything when no default storage class is configured: the claims with no explicit storage class will only be bound to matching persistent volume with no storage class, if any. If the matching persistent volumes belong to an explicit storage class, they won't bind: this is because the claim and the persistent volume's storage classed must match to bind. | |||
An optional persistent volume name can be specified in the persistent volume claim metadata. <font color=darkgray>More qualified content here.</font> | |||
===Persistent Volume Claim Manifest=== | ===Persistent Volume Claim Manifest=== | ||
{{Internal|Kubernetes Persistent Volume Claim Manifest|Persistent Volume Claim Manifest}} | {{Internal|Kubernetes Persistent Volume Claim Manifest|Persistent Volume Claim Manifest}} | ||
== | ===Persistent Volume Claim Template=== | ||
{{External|https://kubernetes.io/docs/concepts/storage/ | {{Internal|Kubernetes_StatefulSet#Persistent_Volume_Claim_Template|StatefulSet Persistent Volume Claim Template}} | ||
==Volume and Claim Lifecycle and Binding== | |||
{{External|https://kubernetes.io/docs/concepts/storage/persistent-volumes/#lifecycle-of-a-volume-and-claim}} | |||
<font color=darkgray>TODO: next time I am here, process https://kubernetes.io/docs/concepts/storage/persistent-volumes/#lifecycle-of-a-volume-and-claim and then integrate this: | |||
= | A persistent volume is a cluster-level resource. A persistent volume claim is a request for a persistent volume resource, and acts as a claim check to the resources. To get access to storage, a pod lists a persistent volume claim in its volumes list. The persistent volume claim must exist as API resource. It usually specifies a storage class. <font color='orange'>Can a PVC request a specific PV, not a generic PV from a storage class?</font>. During the pod deployment, the appropriate persistent volume from the storage class is identified (if exists), allocated and bound to the persistent volume claim, and thus bound to the pod. A persistent volume can be associated with one and only one persistent volume claim. However, multiple pods ca use the same persistent volume claim, thus sharing the persistent volume. The binding between a persistent volume and persistent volume claim is reflected both in the manifest of the persistent volume claim: | ||
<syntaxhighlight lang='yaml'> | |||
apiVersion: v1 | |||
kind: PersistentVolumeClaim | |||
metadata: | |||
name: efs-claim | |||
[...] | |||
spec: | |||
[...] | |||
volumeMode: Filesystem | |||
volumeName: efs-pv | |||
</syntaxhighlight> | |||
<syntaxhighlight lang='yaml'> | |||
apiVersion: v1 | |||
kind: PersistentVolume | |||
metadata: | |||
name: efs-pv | |||
spec: | |||
[...] | |||
claimRef: | |||
apiVersion: v1 | |||
kind: PersistentVolumeClaim | |||
name: efs-claim | |||
namespace: dev | |||
resourceVersion: "18663986" | |||
uid: a139cd2f-3223-4caa-bdd1-9b6d80ca7b1 | |||
</syntaxhighlight> | |||
</font> | |||
=Dynamic Volume Provisioning= | =Dynamic Volume Provisioning= | ||
{{External|https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/}} | {{External|https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/}} | ||
= | As per 2019, [https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/ dynamic provisioning of local volumes is under design]. | ||
=Persistent Volume Controller= | |||
The persistent volume controller matches persistent volume claims with suitable persistent volumes. It is part of the [[Kubernetes_Control_Plane_and_Data_Plane_Concepts#Persistent_Volume_Controller|controller manager]]. | |||
=<span id='Persistence_Operations'></span>Storage Operations= | |||
{{Internal|Kubernetes_Storage_Operations#Get_Information_about_Persistent_Volumes|Storage Operations}} |
Latest revision as of 20:37, 26 September 2021
External
Internal
Overview
Kubernetes has a mature and feature-rich subsystem called the persistent volume subsystem, which exposes external storage to applications.
Pod Volumes
Regardless of where it comes from, external storage is exposed to pods in the form of volumes (or pod volume, as opposite to persistent volumes).
A Kubernetes pod volume has the same lifetime as the pod that encloses it. For more details on the relationship between a pod and its volumes, see:
The volume outlives any containers that run within the pod, and data is preserved across container restarts. However, when a pod ceases to exist, the volume will too cease to exist. A pod can use multiple volumes, at the same time. Conceptually, a pod volume is just a directory, which is accessible to the containers in the pod. However, the actual backing medium of the directory, and its contents are determined by the particular volume type used. More details on how volumes and volume mounts are declared in the pod manifests are available in:
Also see difference between a (pod) volume and a Persistent volume.
Volume Types
configMap
This type of volume is backed by a CofigMap API resource instance. For more details, see:
secret
This type of volume is backed by a Secret API resource instance. secret volumes are backed by tmpfs (RAM-backed filesystem) so they are never written to non-volatile storage. For more details, see:
A typical secret volume definition looks as follows:
kind: Pod
spec:
[...]
volumes:
- name: my-secret-volume
secret:
defaultMode: 256
secretName: my-secret
When projected into the pod, the secret files belong to root:root, even if the pod's security context specify a runAsUser or runAsGroup. However, if fsGroup is defined in the pod security context, the secret files belong to fsGroup and the file permissions are automatically adjusted so they are readable by the group.
defaultMode
Specifies the permissions for the file created by the secret volume mount. JSON does not support octal notation, so the "0400" octal notation must be converted to decimal (256). If YAML is used, octal notation can be used. Note that if fsGroup is declared in the pod security context, the file permissions are automatically adjusted so they are readable by the group, even if the defaultMode is 0400.
downwardAPI
emptyDir
An emptyDir volume is erased when the pod is removed.
hostPath
A hostPath volume mounts a file or a directory from the node's host file system into the pod.
Normally, this is not something that pods should do, as it couples a pod with a specific node. This type of volume might introduce non-determinism in the pod behavior because pods with identical configuration may behave differently on different nodes due to different files on the nodes. The recommended way to consume local storage is via local volumes.
The files or directories created on the underlying hosts are only writable by root. You either need to run your process as root in a privileged container or modify the file permissions on the host to be able to write to a hostPath volume.
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: test
...
volumeMounts:
- mountPath: /test-pd
name: test-volume
volumes:
- name: test-volume
hostPath:
# directory location on host
path: /data
# this field is optional
type: Directory
path
Required parameter that specifies the path on the local host filesystem.
type
An empty string (default) is for backward compatibility, which means that no checks will be performed before mounting the hostPath volume.
Other supported values:
- DirectoryOrCreate. If nothing exists at the given path, an empty directory will be created there as needed with permission set to 0755, having the same group and ownership with Kubelet.
- Directory. A directory must exist at the given path.
If we rely on the existence of the directory on the host, and we don't want to create it upon projection, then it is best to use 'type: Directory'. If the directory does not exist on the host path, the pod creation will fail with "MountVolume.SetUp failed for volume volume-1: hostPath type check failed: /tmp/x is not a directory", as a fail-early test. Also see hostPath on single-node Kubernetes Clusters (minikube,_Docker_Desktop_Kubernetes) below.
- FileOrCreate. If nothing exists at the given path, an empty file will be created there as needed with permission set to 0644, having the same group and ownership with Kubelet.
- File. A file must exist at the given path
- Socket. A UNIX socket must exist at the given path
- CharDevice. A character device must exist at the given path
- BlockDevice. A block device must exist at the given path
hostPath on single-node Kubernetes Clusters (minikube, Docker Desktop Kubernetes)
Single-node Kubernetes clusters running in VMs, such as Docker Desktop Kubernetes or minikube allow access to their host paths only if those paths are "shared" via the cluster's configuration. For Docker Desktop Kubernetes, host directories can be "shared" via Preferences → Resources → File Sharing (see Docker Desktop File Sharing). For minikube running with a VM driver, directories need to be individually mounted, while there are several mounted by default (see minikube mount). Minikube in bare-metal mode offers direct access to host directories.
If the path being attempted to be mounted as "hostPath" is not among the shared directories, it is interpreted as being relative to the embedded VM that runs the single-node Kubernetes cluster, not to the "outer" host and it is usually created inside the VM. Since the directory is created, the directory belongs to root:root, and that explains the impossibility to write into it as a non-root user. To prevent this behavior and fail early, use "type: Directory" for hostPath.
local
A local volume is storage physically attached to the node host. As such, a local volume on a certain node will be only available to pods scheduled on that node. This storage model makes sense for StatefulSets, but not for other pod deployment models: using local storage ties the application to specific nodes, making it harder to schedule. If that node or local volume encounters a failure and becomes inaccessible, then that pod also becomes inaccessible. In addition, many cloud providers do not provide extensive data durability guarantees for local storage, so all data could be lost in certain scenarios. Applications that are suitable for local storage should be tolerant of node failures, data unavailability, and data loss (e.g. Cassandra).
The local volume mechanism allows exposing a local disk, partition or directory. The storage can be exposed to the pod as a block storage (alpha feature at the time of the writing - this is useful to workloads that need to directly access block devices and manage their own data format) or as a filesystem.
Local volumes are available since v1.14.
Before any persistent volume claims for local persistent volumes are created, a dedicated storage class with the volumeBindingMode set to 'WaitForFirstConsumer' must be created. An example is available here.
local Volume Operations
nfs
An nfs volume allows an existing NFS share to be mounted into pods. Unlike emptyDir, which is erased when a pod is removed, the contents of an nfs volume are preserved and the volume is merely unmounted. This makes possible to pre-populated nfs volumes with data, and hand off data to pods and between pods. NFS can be mounted by multiple writers simultaneously. The NFS server must be running and the share exported before it can be used as an nfs volume. This is how a pod mounts an NFS volume:
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: test
...
volumeMounts:
- mountPath: "/something"
name: nfs-volume
volumes:
- name: nfs-volume
nfs:
# the URL of the NFS server
server: 10.10.2.249
path: /opt/nfs0
Important The Kubernetes node host on which pods that attempt to mount nfs volumes are schedules must have NFS client dependencies, as described in NFS Client Installation, otherwise the mount will fail with messages similar to "mount: wrong fs type, bad option, bad superblock on 1..."
NFS volume example:
Also see:
persistentVolumeClaim
A persistentVolumeClaim volume is used to mount a persistent volume into the pod, by raising a "claim" to storage, in form of a persistent volume claim API object. This mode allows getting storage without knowing the details of a particular environment. This is how a pod requests a persistent volume:
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: test
...
volumeMounts:
- mountPath: "/something"
name: pvc-volume
volumes:
- name: pvc-volume
persistentVolumeClaim:
claimName: test-pvc
persistentVolumeClaim Idiosyncrasies
If the same claim name is reused for a volume with a different name, the pod will not start with:
Unable to attach or mount volumes: unmounted volumes=[persistent-storage], unattached volumes=[default-token-j6wgp persistent-storage persistent-storage-2]: timed out waiting for the condition
persistentVolumeClaim and hostPath
A hostPath (local directory) can be exposed to a pod as a persistent volume, attached to the pod via a persistent volume claim:
projected
TODO: A projected volume maps several existing volume sources into the same directory.
awsElasticBlockStore
An awsElasticBlockStore volume mounts an Amazon Elastic Block Store volume into the pod. The EBS volume is a raw block volume. When the pod is removed, the contents of the ESB volumes are preserved, and the ESB volume is merely unmounted. This means it can be pre-populated with data, which can be handed off to pods. To use awsElasticBlockStore volumes, the nodes on which pods are running must be AWS EC2 instances, and those instances need to be in the same region and availability-zone as the EBS volume. EBS only supports a single EC2 instance mounting a volume.
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: test
...
volumeMounts:
- mountPath: /test-ebs
name: test-volume
volumes:
- name: test-volume
# This AWS EBS volume must already exist
awsElasticBlockStore:
volumeID: <volume-id>
fsType: ext4
glusterfs
A glusterfs volume allows a GlusterFS volume to be mounted into the pod.
Also see:
azureDisk
Mounting Volumes in Pods
Storage Providers
Storage is made available to a Kubernetes cluster by storage providers. The Kubernetes persistent volume subsystem supports, among others:
- iSCSI volumes
- SMB
- NFS volumes
- Enterprise storage arrays from vendors like EMC and NetApp
- object storage blobs
- Amazon Elastic Block Store block devices
- Azure File resources, AzureDisk. See Azure Kubernetes Storage below.
- GCE Persistent Disks
- GlusterFS volumes
Each storage provider has its own plugin that handles the details of exposing the storage to the Kubernetes cluster.
Azure Kubernetes Storage
Storage Plugins
The terms "storage plugin" and "provisioner" can be used interchangeably. "Provisioner" is used especially when dynamic provisioning is involved. "Driver" is another equivalent term for storage plugin.
Old storage plugins used to be implemented as part of the main Kubernetes code tree (in-tree storage plugins), which raised a series of problems, such as that all had to be open-source and their release cycle was tied to the Kubernetes release cycle. Newer plugins are based on the Container Storage Interface (CSI) and can be developed out-of-tree.
Plugin Types
kubernetes.io/no-provisioner
kuberentes.io/aws-ebs
kuberentes.io/gce-pd
kubernetes.io/azure-file
Other Provisioners
- quay.io/kubernetes_incubator/nfs-provisioner
- https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner - a local volume static provisioner that manages the persistent volume lifecycle for pre-allocated disks by detecting and creating PVs for each local disk on the host, and cleaning up the disks when released. It does not support dynamic provisioning.
Container Storage Interface (CSI)
Container Storage Interface (CSI) is an open standard that provides a clean interface for storage plugins and abstracts the internal Kubernetes storage details. CSI provides means so the external storage can be leveraged in a uniform way across multiple container orchestrators - not only Kubernetes. Both block and filesystem storage can be exposed via CSI.
CSIDriver
The Kubernetes resources supporting the CSIDriver.
kubectl get csidriver
Amazon EFS CSI
Azure CSI
Persistent Volume CSI Configuration
See CSI below.
API Resources
The persistent volume subsystem consists of the following three API resource types that allow applications to consume storage: persistent volumes, persistent volume claims and storage classes:
Storage Class (SC)
A storage class is an API resource that allows the definition of a class or tier of storage, from which an application can then dynamically request storage. Storage classes are not namespaced.
For an overview of how storage classes, volumes and volume claims work together, see Volume and Claim Lifecycle and Binding below.
Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies defined by the cluster administrators. Obviously, the type of storage classes that can be defined depends on the types of external storage the Kubernetes cluster has access to. A pod can use a dynamically-provisioned persistent volume from a specific storage class by using a persistent volume claim that references that storage class by name. The persistent volume that will provide the storage does not need to be created or declared: the storage class creates the persistent volume dynamically. Once deployed, the storage class watches the API server for new PVC objects that reference its name. When a matching persistent volume claim appears, the storage class dynamically creates the required persistent volume.
The storage class resources are defined in the storage.k8s.io/v1 API group. Each storage class object relates to a single provisioner. StorageClass objects are immutable, they cannot be modified once deployed.
Storage Class Manifest
Default Storage Class
For the time being, the default storage class is set via annotations. If the cluster has a default storage class, a pod can be deployed using just a persistent volume claim - the storage class does not need to be manually created.
Examples
Persistent Volume (PV)
The persistent volume is the API resource that maps onto external storage assets and makes them accessible to the Kubernetes cluster and to applications. Each persistent volume is an object in the Kubernetes cluster that maps back to a specific storage asset (LUN, share, blob, etc.). A single external storage asset can only be used by a single persistent volume.
The persistent volume, lasts for the cluster lifetime, unlike a pod volume, which lasts for the pod lifetime.
A pod can use a persistent volume by indicating a persistent volume claim (see below) whose access mode, storage class name and capacity match that of the persistent volume. The pod cannot specify a persistent volume directly, the match is intermediated by the Kubernetes cluster. For an overview of how storage classes, volumes and volume claims work together, see Volume and Claim Lifecycle and Binding below.
From a declarative perspective, to get a persistent volume storage, the pod lists a persistentVolumeClaim volume among the required volumes in its manifest, as shown above.
Difference between a Pod Volume and a Persistent Volume
Persistent Volume Manifest
Access Mode
The binding between a Persistent Volume and its Persistent Volume Claims can be made in one mode only. It is not possible for a persistent volume to have one Persistent Volume Claim bound to a Persistent Volume in ReadOnlyMany mode and another Persistent Volume Claim bound to the same volume in ReadWriteMany mode.
ReadWriteOnce (RWO)
This mode defines a Persistent Volume that can only be bound in read/write mode by a single Persistent Volume Claim. An attempt to bind it via multiple Persistent Volume Claims will fail. In general, block storage normally only supports RWO.
ReadWriteMany (RWM)
This mode defines a Persistent Volume that can be bound in read/write mode by multiple Persistent Volume Claims. In general, file storage and object storage support RWM.
ReadOnlyMany (ROM)
This mode defines a Persistent Volume that can be bound in read only mode by multiple Persistent Volume Claims.
Reclaim Policy
The reclaim policy tells Kubernetes what to do with a persistent volume when its persistent volume claim has been released.
Delete
This policy deletes the persistent volume and the underlying associated external storage resource, on the external storage system. This is the default policy for volumes that are created dynamically via a storage class.
Retain
This policy keeps the persistent volume in the cluster, as well as the underlying associated external storage resource, on the external storage system. However, it will prevent another persistent volume claim from using the persistent volume. To reuse the space associated with a retained persistent volume, the persistent volume should be manually deleted, the underlying external storage reformatted and then the persistent volume should be recreated.
Local persistent volumes can only support a "Retain" reclaim policy. The administrator must manually clean up and set up the local volume again for reuse.
Storage Class Name
Capacity
The capacity, expressed in the persistent volume manifest, can be less than the actual underlying physical storage, but cannot be more.
Node Affinity
The persistent volume scheduler uses the node affinity configuration of a local persistent volume to understand what node host the storage for the volume is available on.
CSI
driver
volumeHandle
Persistent Volume Claim (PVC)
Pods do not act directly on persistent volumes, they need something called Persistent Volume Claims, which is an API resource object that is bound to the Persistent Volume the pod wants to use. A Persistent Volume Claim is similar to a ticket that authorizes a pod to use a certain Persistent Volume. Once an application has a Persistent Volume Claim, it can mount the respective volume into its pod.
Persistent Volume Claims are namespaced, so their "effective" name is <namespace>/<claim-name>. Two different Persistent Volume Claims with the same name, but declared in different namespaces are different, so if one is bound to a Persistent Volume, the other cannot be bound to the same volume.
A Persistent Volume Claim can be bound to one and only one Persistent Volume. However, multiple pods can use the same Persistent Volume Claim, accessing, and sharing the same Persistent Volume, if the persistent volume storage allows sharing. For an in-depth discussion on how storage classes, volumes and volume claims work together, see Volume and Claim Lifecycle and Binding below.
From a declarative perspective, to get a persistent volume storage, the pod lists a persistentVolumeClaim volume among the required volumes in its manifest, as shown above.
Persistent Volume Claims and Storage Class
A claim may request a particular storage class by specifying its name, using the attribute storageClassName. If the claim expressly requests a class, only the persistent volumes of that class can be bound to the claim. Claims do not necessarily have to request a class. A claim with its storageClassName set to "" is always interpreted to be requesting a persistent volume with no class, so it can only be bound to persistent volumes with no class (no annotation or one set equal to ""). A claim with no storageClassName is not quite the same and is treated differently by the cluster, depending on whether the DefaultStorageClass admission controller is turned on. The DefaultStorageClass admission controller observes creation of PersistentVolumeClaim objects that do not request any specific storage class and automatically adds a default storage class to them. This way, users that do not request any special storage class do not need to care about them at all and they will get the default one. When more than one storage class is marked as default, it rejects any creation of persistent volume claim with an error and an administrator must revisit their StorageClass objects and mark only one as default. This admission controller ignores any persistent volume claim updates; it acts only on creation. The admission controller does not do anything when no default storage class is configured: the claims with no explicit storage class will only be bound to matching persistent volume with no storage class, if any. If the matching persistent volumes belong to an explicit storage class, they won't bind: this is because the claim and the persistent volume's storage classed must match to bind.
An optional persistent volume name can be specified in the persistent volume claim metadata. More qualified content here.
Persistent Volume Claim Manifest
Persistent Volume Claim Template
Volume and Claim Lifecycle and Binding
TODO: next time I am here, process https://kubernetes.io/docs/concepts/storage/persistent-volumes/#lifecycle-of-a-volume-and-claim and then integrate this:
A persistent volume is a cluster-level resource. A persistent volume claim is a request for a persistent volume resource, and acts as a claim check to the resources. To get access to storage, a pod lists a persistent volume claim in its volumes list. The persistent volume claim must exist as API resource. It usually specifies a storage class. Can a PVC request a specific PV, not a generic PV from a storage class?. During the pod deployment, the appropriate persistent volume from the storage class is identified (if exists), allocated and bound to the persistent volume claim, and thus bound to the pod. A persistent volume can be associated with one and only one persistent volume claim. However, multiple pods ca use the same persistent volume claim, thus sharing the persistent volume. The binding between a persistent volume and persistent volume claim is reflected both in the manifest of the persistent volume claim:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: efs-claim
[...]
spec:
[...]
volumeMode: Filesystem
volumeName: efs-pv
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv
spec:
[...]
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: efs-claim
namespace: dev
resourceVersion: "18663986"
uid: a139cd2f-3223-4caa-bdd1-9b6d80ca7b1
Dynamic Volume Provisioning
As per 2019, dynamic provisioning of local volumes is under design.
Persistent Volume Controller
The persistent volume controller matches persistent volume claims with suitable persistent volumes. It is part of the controller manager.