Docker Concepts

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Overview

Docker is at the same time a packaging format, a set of tools with server and client components, and a development and operations workflow. Because it defines a workflow, Docker can be seen as a tool that reduces the complexity of communication between the development and the operations teams.

Docker architecture centers on atomic and throwaway containers. During the deployment of a new version of an application, the whole runtime environment of the old version of the application is thrown away with it, including dependencies, configuration, all the way to, but excluding the O/S kernel. This means the new version of the application won't accidentally use artifacts left by the previous release, and the ephemeral debugging changes are not going to survive. This approach also makes the application portable between servers, which act as places where to dock standardized containers.

A Docker release artifact is a single file, whose format is standardized. It consists of a set of layers assembled in an image.

The ideal Docker application use cases are stateless applications or applications that externalize their state in databases or caches: web frontends, backend APIs and short running tasks.

The Linux kernel has provided support for container technologies for years, but more recently the Docker project has developed a convenient management interface for containers on a host.

Docker Workflow

A Docker workflow represent the sequence of operations required to develop, test and deploy an application in production using Docker.

The Docker workflow largely consists in the following sequence:

  1. Developers build and test a Docker image and ship it to the registry.
  2. Operations engineers provide configuration details and provision resources.
  3. Developers trigger the deployment.

Container

A Linux container is a lightweight mechanism for isolating running processes, so these processes interact only with designated resources. The process tree runs in a segregated environment provided by the operating system, with restricted access to these resources, and the container allows the administrator to monitor resource usage. Inbound or outbound external access is done via a virtual network adapter. From an application's perspective, it looks like the application is running alone inside its own O/S installation. Multiple applications can be run in containers on the same host, and each application won't have visibility into other applications' processes, files, network, etc. Typically, each container provides a single service, often called a microservice. While it is technically possible to run multiple services within a container, this is generally not considered a best practice: the fact that a container provides a single functions makes it theoretically easy to scale horizontally.

A Docker container is a Linux container that has been instantiated from a Docker image.

Difference Between Containers and Images

Once instantiated, a container represents the runtime instance of the image it was instantiated from. The difference between the image and a container instantiated from it consists of an extra writable layer, which is added on top of the topmost layer of the image. All the activity inside the container that adds new data or modifies existing data results in these changes being stored in the writable layer.

When the container is deleted (not simply stopped, a stopped container that is restarted regains access to its writable layer), the writable layer is discarded so all the changes to the image are lost, but the underlying image remains unchanged.

Image

For differences between an image and a container, see Difference Between Containers and Images above.

Base Image

When a container is assembled from a Dockerfile, the initial image upon which layers are being added is called the base image. A base image has no parents. The base image is specified by the Dockerfile FROM instruction.

Searching for Images

The Docker client command search can be used to search for images in Docker Hub or other repositories.

Layer

Tag

An image may be tagged in the local registry when the image is first built, using the -t option of the docker build command, or with the docker tag command. An image may have multiple tags.

Dockerfile

A Dockerfile defines how a container should look at build time, and it contains all the steps that are required to create an layered image. Each command in the Dockerfile generates a new layer in the image. The Dockerfile is an argument (possibly implicit, if present in the directory the command is run from) of the build command. For more details, see:

Dockerfile

.dockerignore

.dockerignore

Image Repository

A Docker image repository is a collection of different Docker images with same name, that have different tags.

Image Registry

An image registry is a hosted service that publishes repositories of images. Clients interact with the registry using a registry API. The default Docker registry is Docker Hub.

Local Image Registry

Docker caches images downloaded from remote repositories locally, in a local registry. The content of the local registry can be queried with docker images. Images can be removed from the local registry with docker rmi.

Docker Hub

Docker Hub is a cloud service that offers image registry functionality. It is useful for sharing application and automating workflows:

https://hub.docker.com

The client search command searches Docker Hub (by default) for images whose name match the command argument.

Labels

Labels represent metadata in the form of key/value pairs, and they can be specified with the Dockerfile LABEL command. Labels can be applied to containers and images and they are useful in identifying and searching Docker images and containers. Labels applied to an image can be retrieved with docker inspect command.

Docker and Virtualization

Containers implement virtualization above the O/S kernel level.

In case of O/S virtualization, a virtual machine contains a complete guest operating system and runs its own kernel, in top of the host operating system. The hypervisor that manages the VMs and the VMs use a percentage of the system's hardware resources, which are no longer available to the applications.

A container is just another process, with a lightweight wrapper around it, that interacts directly with the Linux kernel, and can utilize more resources that otherwise would have gone to hypervisor and the VM kernel. The container includes only the application and its dependencies. It runs as an isolated process in user space, on the host's operating system. The host and all containers share the same kernel.

A virtual machine is long lived in nature. Containers have usually shorter life spans.

The isolation among containers is much more limited than the isolation among virtual machines. A virtual machine has default hard limits on hardware resources it can use. Unless configured otherwise, by placing explicit limits on resources containers can use, they compete for resources.

Docker Revision Control

Docker provides two forms of revision control:

  • Tracking the filesystem layers the images are made up
  • Tagging for build containers

Cloud Platform

Docker is not a cloud platorm. It only handles containers on pre-existing Docker hosts. It does not allow to create new hosts, object stores, block storage, and other resources that can be provisioned dynamically by a cloud platform.

Security

Production containers should almost always be run under the context of a non-privileged user. See Dockerfile USER.

Privileged Container

A privileged container is also referred to as an infrastructure container. Container.

Dependencies

The Docker workflow allows all dependencies to be discovered during the development and test cycles.

The Docker Client

The Docker client is an executable used to control most of the Docker workflow and communicate with remote servers. The Docker client runs directly on most major operating systems. The same Go executable acts as both client and server, depending on how it is invoked. The client uses the Remote API to communicate with the server.

Client Operations

The Docker Server

The Docker server (also referred as the Docker daemon) is a process that runs as a daemon and manages the containers, and the client tells the server what to do. The server uses Linux containers and the underlying Linux kernel mechanisms (cgroups, namespaces, iptables, etc.), so it can only run on Linux servers. The same Go executable acts as both client and server, depending on how it is invoked, and it will launch as server only on supported Linux hosts. Each Docker host will normally have one Docker daemon that can manage a number of containers.

The server can talk directly to the image registries when instructed by the client.

The server listens on 2375 for non-encrypted traffic and 2376 for encrypted traffic.

Server Operations

Client/Server Communication

The client and server communicate over network (TCP or Unix) sockets.

Remote API

https://docs.docker.com/engine/api/

cgroups

cgroups

Namespaces

Namespaces

Container Networking

https://docs.docker.com/engine/userguide/networking/

A Docker container behaves like a host on a private network. Each container has its own virtual Ethernet interface and its own IP address. All containers managed by the same server are on a default virtual network together and can talk to each other directly. In order to get to the host and the outside world, the traffic from the containers goes over an interface called docker0: the Docker server acts as a virtual bridge for outbound traffic. The Docker server also allows containers to "bind" to ports on the host, so outside traffic can reach them: the traffic passes over a proxy that is part of the Docker server before getting to containers.

The default mode can be changed, for example --net configures the server to allow containers to use the host's own network device and address.

Docker Projects

Boot2Docker

It is deprecated.

Docker Machine

https://github.com/docker/machine

"Docker Up and Running" Page 54.

Docker Compose

https://github.com/docker/compose

Docker Swarm

https://github.com/docker/swarm/

Atomic Host

An atomic host is a small, finely tuned operating system image like https://coreos.com or http://www.projectatomic.io, that supports container hosting and atomic OS upgrades.

Logging

https://docs.docker.com/engine/admin/logging/overview/#none

Entrypoint

An ENTRYPOINT defines the default executable for the image, in a similar way CMD does, and it can be overridden with:

docker run ... --entrypoint <other-entrypoint>

However, CMD can be overridden easier, by just specifying the command in the command line without any flag, so ENTRYPOINT should be used in scenarios where you want the container to behave exclusively as it were the executable specified by ENTRYPOINT. In other words, when you don't want or expect the user to override the executable.

When both an ENTRYPOINT and CMD are specified, the CMD string(s) will be appended to the ENTRYPOINT in order to generate the container's command string. When using ENTRYPOINT and CMD together it's important that you always use the exec form of both instructions.

To determine the values of these instructions, run:

docker inspect <image-id>

They will be available as "Cmd" and "Entrypoint".

Exec Form and Shell Form

Both ENTRYPOINT and CMD directives support two different forms: the shell form and the exec form.

When specifying the shell form, the binary is executed with an invocation of the shell using

/bin/sh -c

Docker Components

Docker Engine

Docker Engine is a portable runtime and packaging tool.