Docker Concepts

From NovaOrdis Knowledge Base
Revision as of 01:51, 15 February 2018 by Ovidiu (talk | contribs) (→‎Bind Mount)
Jump to navigation Jump to search

External

Internal

Overview

Docker is at the same time a packaging format, a set of tools with server and client components, and a development and operations workflow. Because it defines a workflow, Docker can be seen as a tool that reduces the complexity of communication between the development and the operations teams.

Docker architecture centers on atomic and throwaway containers. During the deployment of a new version of an application, the whole runtime environment of the old version of the application is thrown away with it, including dependencies, configuration, all the way to, but excluding the O/S kernel. This means the new version of the application won't accidentally use artifacts left by the previous release, and the ephemeral debugging changes are not going to survive. This approach also makes the application portable between servers, which act as places where to dock standardized containers.

A Docker release artifact is a single file, whose format is standardized. It consists of a set of layers assembled in an image.

The ideal Docker application use cases are stateless applications or applications that externalize their state in databases or caches: web frontends, backend APIs and short running tasks.

The Linux kernel (see "Architecture" below) has provided support for container technologies for years, but more recently the Docker project has developed a convenient management interface for containers on a host.

Docker Workflow

A Docker workflow represent the sequence of operations required to develop, test and deploy an application in production using Docker.

The Docker workflow largely consists in the following sequence:

  1. Developers build and test a Docker image and ship it to the registry.
  2. Operations engineers provide configuration details and provision resources.
  3. Developers trigger the deployment.

Container Best Practices

Container Best Practices

Architecture

Containers require several kernel-level mechanisms to be available to work correctly:

  • process isolation is provided by the kernel namespaces mechanism.
  • capability to control container's access to the system resources is provided by the croups mechanism.
  • security that comes from separation between the host and the container, and between individual containers is enforced with SELinux.

Namespaces

By default, all containers have PID Namespace, UTS Namespace enabled.

Namespaces

cgroups

https://docs.docker.com/config/containers/runmetrics/#metrics-from-cgroups-memory-cpu-block-io

For each container, one cgroup is created in each hierarchy. The cgroup is "lxc/<container-name>".

More:

cgroups

Container

A Linux container is a lightweight mechanism for isolating running processes, so these processes interact only with designated resources. The process tree runs in a segregated environment provided by the operating system, with restricted access to these resources, and the container allows the administrator to monitor resource usage. Inbound or outbound external access is done via a virtual network adapter. From an application's perspective, it looks like the application is running alone inside its own O/S installation. An image encapsulates all files required to run an application - all the dependencies of an application and configuration - and it can be deployed on any environment that has support for running containers. The same bundle can be assembled, tested and shipped to production without any change. Container images are a packaging technology.

Multiple applications can be run in containers on the same host, and each application won't have visibility into other applications' processes, files, network, etc. Typically, each container provides a single service, often called a microservice. While it is technically possible to run multiple services within a container, this is generally not considered a best practice: the fact that a container provides a single functions makes it theoretically easy to scale horizontally.

A Docker container is a Linux container that has been instantiated from a Docker image. Physically, the Docker container is a reference to a layered filesystem image and some configuration metadata (environment variables, for example). The detailed information that goes along with a container can be displayed with docker inspect.

Difference Between Containers and Images - a Writable Layer

Once instantiated, a container represents the runtime instance of the image it was instantiated from. The difference between the image and a container instantiated from it consists of an extra writable layer, which is added on top of the topmost layer of the image. This layer is often called the "container layer". All activity inside the container that adds new data or modifies existing data - writing new files, modifying existing files or deleting files - will result in changes being stored in the writable layer. Any files the container does not change do not get copied in the writable layer, which means the writable layer is as small as possible. When an existing file is modified, the storage driver performs a copy-on-write operation.

The state of this writable layer can be inspected at runtime by logging into the container, or it can be exported with docker export and inspected offline. Because each container has its own writable container layer, which store the changes that are particular to a specific container, multiple containers can share access to the same underlying image and yet maintain their own state. If multiple images must share access to the same state, it should be done by storing the data in a Docker volume mounted in all the containers. Docker volumes should also be used for write-heavy application, which should not store data in the container.

When the container is stopped with docker stop, the writable layer's state is preserved, so when the container is restarted with docker start, the runtime container regains access to it. When the container is deleted with docker rm, the writable layer is discarded so all the changes to the image are lost, but the underlying image remains unchanged.

The size of the writable layer is reported as "size" by docker ps -s.

Container ID

The long value can be obtained with:

docker inspect --format="{{.Id}}" <short-container-ID>|<container-name>

Image a Container is Created From

The name of an image the container was instantiated from can be obtained by running docker ps. The image name is found in the "IMAGE" column.

Logging

https://docs.docker.com/engine/admin/logging/overview/#none
https://docs.docker.com/config/containers/logging/configure/

Container logging consists in content sent to the stdout and stderr by the process (processes) running within the container.

By default, the logging information gets translated into JSON records and written on the docker server files system in /var/lib/docker/containers/container-id/container-id-json.log and it cannot be accessed with docker logs.

Container Metadata

Image and Container Metadata

Configuration

The container configuration can be accessed with docker inspect and it can be edited with docker update. It is also available on the docker server under /var/lib/docker/containers/<container-id>. More details about specific files and fields:

Docker Container Configuration

Image

Logically, a Docker image is a set of stacked layers, where each layer represents the result of the execution of a Dockerfile instruction. Each layer, except for the last one, the container layer, is read-only, and it only contains differences from the layer before it. The details related to how these layers interact with each other are handled by the storage driver. Physically, a Docker image is a configuration objects that specifies, among other things, an ordered list of layer digests, which enables docker to assemble a container's filesystem with reference to layer digests rather than parent images.

For differences between an image and a container, see Difference Between Containers and Images above.

The image is produced by the build command, as the sole artifact of the build process. When an image needs to be rebuilt, every single layer after the first introduced change will need to be rebuilt.

The space occupied on disk by a container can be estimated based on the output of the docker ps -s command, which provides size and virtual size information. For accounting of the space occupied by container logging, which may be non-trivial, see logging.

Images are stored and accessed by the cryptographic checksum of their contents (the image ID).

Image Metadata

Each image has an associated JSON structure which describes the image. The metadata includes creation date, author, the ID of the parent image, execution/runtime configuration like its entry point, default arguments, CPU/memory shares, networking, and volumes. The JSON structure also references a cryptographic hash of each layer used by the image, and provides history information for those layers. This JSON structure is considered to be immutable, because changing it would change the computed ImageID. Changing it means creating a new derived image, instead of changing the existing image.

Image and Container Metadata

Image ID

The image ID is a digest calculated by applying the SHA256 algorithm to the image metadata, which, among other things, contains an ordered list of layer digests. The content that goes into calculating the digest can be examined with docker inspect. The first 12 digits of the image ID is displayed as "IMAGE ID" by the docker images command.

Image Name

The image name can be used as argument of the docker pull command.

Base Image

https://docs.docker.com/engine/userguide/eng-image/baseimages/

When a container is assembled from a Dockerfile, the initial image upon which layers are being added is called the base image. A base image has no parents. The base image is specified by the Dockerfile FROM instruction. Once a base image was used to create a new image with docker build, it becomes the parent image of the newly created image.

This is an article advising on base images to use: https://www.brianchristner.io/docker-image-base-os-size-comparison/. Base images used so far:

Parent Image

An image’s parent image is the image designated in the FROM directive in the image’s Dockerfile. All subsequent commands are applied to this parent image. A Dockerfile with no FROM directive has no parent image, and is called a base image. The parent image ID can be obtained from the image metadata with docker inspect.

Searching for Images

The Docker client command docker search can be used to search for images in Docker Hub or other repositories.

Layer

A layer of a Docker image represents the result of the execution of a Dockerfile instruction. Each layer is identified by an unique long hexadecimal number named hash. The hash is usually shortened to 12 digits. Each layer is stored in its own local directory inside Docker's local image registry (however the directory names do not correspond to the layer IDs). The layers are version controlled.

Tag

A tag is an alphanumeric identifier of the images within a repository, and it is generally used to identify a particular release of the image. It is a form of Docker revision control. Tags are needed because application develop over time, and a single image name can actually refer to many different versions of the same image. An image is uniquely identified by its hash and possibly by one or several tags. An image may be tagged in the local registry when the image is first built, using the -t option of the docker build command, or with the docker tag command. An image may have multiple tags. For example, the "latest" tag may be associated with a specific version tag.

See:

docker tag

The "latest" Tag

If the docker pull command is used without any explicitly specified tag, "latest" is implied. However, the "latest" tag must exist in the repository on the registry being accessed, for the command to work.

URL

A repository URL. The most generic format is:

[registry][:port][/namespace/]<repository>[:tag]

In not specified, the default registry is "docker.io", the namespace section is "/library/" and the default tag is "latest". More details about "latest".

Union Filesystem

Docker uses a union filesystem to combine all layers within an image into a single coherent filesystem.

Dangling Image

An image is said to be "dangling" if it is not associated with a repository name in a registry, usually the local registry:

REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
<none>                               <none>              0c0359fd3c0d        8 seconds ago       1.14MB

Image Metadata

Image and Container Metadata

Dockerfile

A Dockerfile defines how a container should look at build time, and it contains all the steps that are required to create an layered image. Each command in the Dockerfile generates a new layer in the image. The Dockerfile is an argument, possibly implicit, if present in the directory the command is run from, of the build command. For more details, see:

Dockerfile

Docker Image DSL

Docker defines its own Domain Specific Language (DSL) for creating Docker images.

https://docs.docker.com/engine/reference/builder/

.dockerignore

.dockerignore

Build Context

Build Context

Image Repository

A Docker image repository is a collection of different Docker images with same name, that have different tags.

Repository Name

The repository name can be used as argument of the docker pull command.

Image Registry

https://docs.docker.com/registry/

An image registry is a service for storing and retrieving Docker container images and contains a collection of one or more image repositories. Most image registries are hosted services. Clients interact with the registry using a registry API. The default Docker registry is Docker Hub. The local Docker instance is configured with a number of registry it accesses, which can be listed with docker info:

docker info
...
Registry: https://registry.access.redhat.com/v1/
Insecure Registries:
 172.30.0.0/16
 127.0.0.0/8
Registries: registry.access.redhat.com (secure), registry.access.redhat.com (secure), docker.io (secure)

Other registries:

The docker server can be configured to look up images in arbitrary registries, block registries or allow insecure registries by using the --add-registry, --block-registry and --insecure-registry options in the docker daemon configuration file /etc/sysconfig/docker.

Registry Path

A registry path is similar to a URL, but does not contain a protocol specifier (https://). A registry path can be used as image name prefix when attempting to pull form a different registry than Docker Hub. Example:

registry.access.redhat.com/rhscl/postgresql-95-rhel7

Local Image Registry

Docker caches images downloaded from remote repositories locally, in a local registry. This local registry is also referred to as Docker's local storage area, and it usually lives under /var/lib/docker. Depending on the storage driver in use, the local storage area is mounted under /var/lib/docker/devicemapper, /var/lib/docker/overlay, etc. The content of the local registry can be queried with docker images. Images can be removed from the local registry with docker rmi.

Docker Hub

Docker Hub is a cloud service that offers image registry functionality. It is useful for sharing application and automating workflows:

https://hub.docker.com

The docker search command searches Docker Hub (by default) for images whose name match the command argument.

More Docker Hub operations:

Docker Hub Operations

Nova Ordis Images:

Nova Ordis Docker Hub Images

Image Operations

Image Operations

Labels

Labels represent metadata in the form of key/value pairs, and they can be specified with the Dockerfile LABEL command. Labels can be applied to containers and images and they are useful in identifying and searching Docker images and containers. Labels applied to an image can be retrieved with docker inspect command.

Docker and Virtualization

Containers implement virtualization above the O/S kernel level.

In case of O/S virtualization, a virtual machine contains a complete guest operating system and runs its own kernel, in top of the host operating system. The hypervisor that manages the VMs and the VMs use a percentage of the system's hardware resources, which are no longer available to the applications.

A container is just another process, with a lightweight wrapper around it, that interacts directly with the Linux kernel, and can utilize more resources that otherwise would have gone to hypervisor and the VM kernel. The container includes only the application and its dependencies. It runs as an isolated process in user space, on the host's operating system. The host and all containers share the same kernel.

A virtual machine is long lived in nature. Containers have usually shorter life spans.

The isolation among containers is much more limited than the isolation among virtual machines. A virtual machine has default hard limits on hardware resources it can use. Unless configured otherwise, by placing explicit limits on resources containers can use, they compete for resources.

Docker Revision Control

Docker provides two forms of revision control:

  • Tracking the filesystem layers the images are made up
  • Tagging for build containers

Cloud Platform

Docker is not a cloud platorm. It only handles containers on pre-existing Docker hosts. It does not allow to create new hosts, object stores, block storage, and other resources that can be provisioned dynamically by a cloud platform.

Docker Could

Security

Docker Security

Dependencies

The Docker workflow allows all dependencies to be discovered during the development and test cycles.

The Docker Client

The Docker client is an executable used to control most of the Docker workflow and communicate with remote servers. The Docker client runs directly on most major operating systems. The same Go executable acts as both client and server, depending on how it is invoked. The client uses the Remote API to communicate with the server.

Client Operations

The Docker Server

The Docker server (also referred as the Docker daemon) is a process that runs as a daemon and manages the containers, and the client tells the server what to do. The server uses Linux containers and the underlying Linux kernel mechanisms (cgroups, namespaces, iptables, etc.), so it can only run on Linux servers. The same Go executable acts as both client and server, depending on how it is invoked, and it will launch as server only on supported Linux hosts. Each Docker host will normally have one Docker daemon that can manage a number of containers.

The server can talk directly to the image registries when instructed by the client.

The server listens on 2375 for non-encrypted traffic and 2376 for encrypted traffic.

The Docker server maintains running (and stopped) containers state under /var/lib/docker/containers/<container-id>. The logs are /var/lib/docker/containers/<container-id>/<container-id>-json.log. More about logging available here: Logging.

The daemon requires root privileges, so only trusted users should be allowed to control it.

Server Operations

Client/Server Communication

The client and server communicate over network (TCP or Unix) sockets, via a REST API. For details on how to secure the daemon access see: Secure Docker Daemon

Remote API

https://docs.docker.com/engine/api/

Container Networking

https://docs.docker.com/engine/userguide/networking/

A Docker container behaves like a host on a private network. Each container has its own virtual network stack, Ethernet interface and its own IP address. All containers managed by the same server are connected via bridge interfaces to a default virtual network and can talk to each other directly. Logically, they behave like physical machines connected through a common Ethernet switch. In order to get to the host and the outside world, the traffic from the containers goes over an interface called docker0: the Docker server acts as a virtual bridge for outbound traffic. The Docker server also allows containers to "bind" to ports on the host, so outside traffic can reach them: the traffic passes over a proxy that is part of the Docker server before getting to containers.

The default mode can be changed, for example --net configures the server to allow containers to use the host's own network device and address.

Also see:

Network Namespace

Atomic Host

An atomic host is a small, finely tuned operating system image like https://coreos.com or http://www.projectatomic.io, that supports container hosting and atomic OS upgrades.

Entrypoint

See:

Dockerfile Reference - CMD vs. ENTRYPOINT

Exec Form and Shell Form

Both ENTRYPOINT and CMD directives support two different forms: the shell form and the exec form.

When specifying the shell form, the binary is executed with an invocation of the shell using

/bin/sh -c

Docker Components

Docker Engine

Docker Engine is a portable runtime and packaging tool.

Docker Storage

https://docs.docker.com/engine/userguide/storagedriver/selectadriver/

Image Storage

Storage Driver/Backend

https://docs.docker.com/engine/userguide/storagedriver/

The Docker storage driver handles details related to how various layers, including the container layer interact with each other and how the container image is exposed. Containers and the images they are created from are stored in Docker’s storage back end. Where are images stored (docker images), and where are the running and stopped containers stored? docker ps The Docker server's storage backend communicates with the underlying Linux filesystem to build and manage the multiple image layers that combine to form a single image.

Some storage concepts, such as base device size, which essentially represents the container's root file system size, only apply to specific storage backend, device-mapper in this case, and they will be mentioned in the corresponding sections.

Backends:

devicemapper Storage Driver

device-mapper

overlayfs Storage Driver

This is the default storage driver a RHEL installation will default to.

overlayfs, overlayfs2

AUFS

AUFS

BTRFS

BTRFS

Copy-on-Write (CoW) Strategy

All storage backend drivers provides a fast CoW (copy-on-write) system for image management. Copy-on-write is a strategy of sharing and copying files for maximum efficiency. If a file or a directory exists in a lower layer of the image, and another layer - including the writable layer - needs read access to it, it just uses the existing file from its original layer. If the file needs to be modified - either at build time, when the container is being built, or at run time, when the container is instantiated - the file is copied into the layer that needs the file, and modified.

For an overlay/overlay2 or AUFS drivers, the copy-on-write operation consists of:

  • Search through the image layers for the file to update. The process starts at the newest layer and works down to the base layer, one layer at a time. When a result is found, it is added to a cache.
  • Perform a copy_up operation on the first copy of the file that is found into the writable layer.
  • Any modifications are made to this copy of the file, and the container cannot see the read-only copy of the file that exists in the lower layer.

A copy_up may incur a significant performance overhead, which depends on the storage driver in use. Large files, many layers and deep directory tree can make the impact more noticeable. However, the copy_up operation only occurs the first time a file is modified.

Loopback Storage

The default loopback storage, while appropriate for proof of concept environments, it is not suitable for production.

Non-Image State Storage

Data Volume

https://docs.docker.com/engine/admin/volumes/volumes/

A Docker volume, also referred to as data volume, is a directory or a file in the Docker's host filesystem that is mounted directly into a container and that bypasses the union filesystem. Data volumes are not controlled by a storage driver, and reads and writes bypass the storage driver and operate at native host speeds. Any number of volumes can be mounted in a container. Multiple containers ca also share one or more data volumes.

Data volumes should be used when multiple container need to share filesystem-based state, and also when a container performs write-heavy operations: that data should not be stored in the container's writable layer, because storing state into the container's filesystem will not perform well, but in a docker volume, which is designed for efficient I/O.

The data volumes are defined by the Dockerfile VOLUME directive and bound at run time with -v, --volume or --mount flags. For an existing image or container, data volume definition (Config.Volumes) and bindings (Mounts and HostCofig.Mounts) are available with docker inspect.

Native Host Path Permissions

The native host path will be accessed from inside the container with the user ID of the user running the container, so the mount point has to have sufficient permission to allow file operations.

Data Volume Container

Bind Mount

https://docs.docker.com/engine/admin/volumes/bind-mounts/

A bind mount is a container server host file or directory that is exposed to one (or more) containers. Bind mounts are useful for operations such as synchronizing the time between the host and containers, by exposing the host's /etc/localtime, as read-only, to containers.

Example:

docker run ... --mount type=bind,source=<host-file-or-dir>,target=<in-container-mount-point> ...

The process running inside the container has direct access to the directory on the host, with the UID it runs with in the container, so be aware of the security implications when providing bind mounts.

Bind Mounts vs. Data Volumes

When using a bind mount, an existing host file or directory is exposed to the container. By contrast, in case of a volume, a new directory is created within Docker's storage directory on the host machine, and Docker manages that directory's content.

Environment Variables

Containerized applications must avoid maintaining configuration in filesystem files - if they do, it limits the reusability of the container. A common pattern used to handle application configuration is to move configuration state into environment variables that can be passed to the application from the container. Docker supports environment variables natively, they are stored in the metadata that makes up a container configuration, and restarting the container will ensure the same configuration is passed to the application each time.

Container Downward API

Container Downward API

Config

https://docs.docker.com/engine/swarm/configs/

Docker Projects

Image Building

Builder Pattern

https://blog.alexellis.io/mutli-stage-docker-builds/

The practice of maintaining one Dockerfile for development and a corresponding Dockerfile for production. The development Dockerfile contains the tools and libraries needed to build the application. The production Dockerfile is a slimmed-down version of the development Dockerfile, which only contains the application artifacts and exactly what is needed to run it. However, maintaining two related Dockerfile is not ideal. An alternative is to use a multi-stage build.

Build Cache

docker build - The Build Process

Multi-Stage Build

docker build - Multi-Stage Build

Docker on Mac

Docker on Mac

Resource Management

https://fabiokung.com/2014/03/13/memory-inside-linux-containers/

The docker runtime will kill a process that it is attempting to exceed the resource limits, such as memory specified with -m.

The resource usage statistics for running containers can be displayed with:

docker stats

Controlling CPU

CPU Share Constraint

https://docs.docker.com/engine/reference/run/#cpu-share-constraint

Also see:

cgroups - Controlling Relative Share of CPU

CPU Quota Constraint

https://docs.docker.com/engine/reference/run/#cpu-quota-constraint

Also see:

cgroups - Controlling CPU Throttling