Docker Storage Concepts

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Image Storage

The Docker server stores the layers that form images, and also the layers used by running containers into a dedicated storage system, called the storage backend, managed by a pluggable storage driver.

Storage Driver

The Docker storage driver handles details related to how various layers, including the container layer interact with each other and how the container image is exposed. Containers and the images they are created from are stored in Docker’s storage backend. The Docker server's storage backend communicates with the underlying Linux filesystem to build and manage the multiple image layers that combine to form a single image. Some storage concepts, such as base device size, which essentially represents the container's root file system size, only apply to specific storage backend, device-mapper in this case, and they will be mentioned in the corresponding sections.

To determine what kind of storage backend is in use by a specific server instance, execute docker info. Storage backend details are provided as part of the result:

docker info
...
Storage Driver: devicemapper
 Pool Name: docker-thinpool
 Pool Blocksize: 524.3kB
 Base Device Size: 10.74GB
 Backing Filesystem: xfs
 Udev Sync Supported: true
 Data Space Used: 472.4MB
 Data Space Total: 51GB
 Data Space Available: 50.53GB
 Metadata Space Used: 159.7kB
 Metadata Space Total: 532.7MB
 Metadata Space Available: 532.5MB
 Thin Pool Minimum Free Space: 5.1GB
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.146-RHEL7 (2018-01-22)
...

The layers that make the images, and the running containers, are all stored in the same storage backend. The details differ with the driver. For a devicemapper driver, the metadata identifying the image layers for images reported by docker images is stored under /var/lib/docker/image/devicemapper/layerdb/ and refers block storage allocated in the block device associated with the devicemapper system. The metadata identifying the layers associated with running containers reported by docker ps is stored under /var/lib/docker/containers/, /var/lib/docker/image/devicemapper/imagedb/ and /var/lib/docker/image/devicemapper/layerdb/ and also refers block storage allocated in the block device associated with the devicemapper system

Available storage drivers:

devicemapper Storage Driver

device-mapper

overlayfs Storage Driver

This is the default storage driver a RHEL installation will default to. Ubuntu installations also prefer it, if they have 4.x kernels.

overlayfs, overlayfs2

AUFS

AUFS

BTRFS

BTRFS

Copy-on-Write (CoW) Strategy

All storage backend drivers provides a fast CoW (copy-on-write) system for image management. Copy-on-write is a strategy of sharing and copying files for maximum efficiency. If a file or a directory exists in a lower layer of the image, and another layer - including the writable layer - needs read access to it, it just uses the existing file from its original layer. If the file needs to be modified - either at build time, when the container is being built, or at run time, when the container is instantiated - the file is copied into the layer that needs the file, and modified.

For an overlay/overlay2 or AUFS drivers, the copy-on-write operation consists of:

  • Search through the image layers for the file to update. The process starts at the newest layer and works down to the base layer, one layer at a time. When a result is found, it is added to a cache.
  • Perform a copy_up operation on the first copy of the file that is found into the writable layer.
  • Any modifications are made to this copy of the file, and the container cannot see the read-only copy of the file that exists in the lower layer.

A copy_up may incur a significant performance overhead, which depends on the storage driver in use. Large files, many layers and deep directory tree can make the impact more noticeable. However, the copy_up operation only occurs the first time a file is modified.

Loopback Storage

The default loopback storage, while appropriate for proof of concept environments, it is not suitable for production.

Container-Generated Data Storage

Data Volume

https://docs.docker.com/engine/admin/volumes/volumes/

A Docker volume, also referred to as data volume, is a directory or a file in the Docker's host filesystem that is mounted directly into a container and that bypasses the union filesystem. Data volumes are not controlled by a storage driver, and reads and writes bypass the storage driver and operate at native host speeds. Any number of volumes can be mounted in a container. Multiple containers ca also share one or more data volumes. Data volumes should be used when multiple container need to share filesystem-based state, and also when a container performs write-heavy operations: that data should not be stored in the container's writable layer, because storing state into the container's filesystem will not perform well, but in a docker volume, which is designed for efficient I/O.

The data volumes are defined by the Dockerfile VOLUME directive and bound at run time with -v, --volume or --mount flags. Further explain the need for VOLUME, relationship and the binding process. For an existing image or container, data volume definition (Config.Volumes) and bindings (Mounts and HostCofig.Mounts) are available with docker inspect.

The native host path will be accessed from inside the container with the UID of the user running the container, so the mount point has to have sufficient permission to allow file operations. For more details on how host-level files are exposed to the containers, see UID/GID Mapping.

Named Volume

Named volumes can be mounted with ... --mount type=volume,source=<volume-name>,destination=....

Anonymous Volume

Anonymous volumes can be created, and mounted with ... --mount type=volume,destination=<path-in-container> ... (note there is no "source"):

Volume Driver

Local Volume Driver

The default volume driver is the "local" driver - a locally scoped volume driver. Note that multiple containers writing to a single shared volume can cause data corruption if the software running inside the container is not designed to handle concurrent processes writing to the same location.

Bind Mount

https://docs.docker.com/engine/admin/volumes/bind-mounts/

A bind mount is a container server host file or directory that is exposed to one (or more) containers. Bind mounts are useful for operations such as synchronizing the time between the host and containers, by exposing the host's /etc/localtime, as read-only, to containers.

Example:

docker run ... --mount type=bind,source=<host-file-or-dir>,target=<in-container-mount-point> ...

The process running inside the container will gain direct access to the directory or file on the host. The UID the process runs with inside the container will be translated to UID 0, or the UID of the Docker server host user, and the host-level file will be exposed to access under that UID. This is a security implication that you should be aware of when when providing bind mounts.

For more details on how host-level files are exposed to the containers, see UID/GID Mapping.

Bind Mounts vs. Data Volumes

When using a bind mount, an existing host file or directory is exposed to the container. By contrast, in case of a volume, a new directory is created within Docker's storage directory on the host machine, and Docker manages that directory's content.

UID/GID Mapping

This subject is relevant for data volumes and bind month.

Next time I need this:

Container-Generated Data Storage Operations

Container-Generated Data Storage Operations