Linux Virtualization Concepts
External
- https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Getting_Started_Guide/index.html
- https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/index.html
Internal
Generic Virtualization Concepts
Virtualization Solutions from Red Hat
RHEL 7 includes a hypervisor and a number of virtualization tools, which allows running guest operating systems, so it can function as a virtualization platform. However, the solution supports a limited number of guests per host and a limited range of guest types. Red Hat Virtualization is an enterprise virtualization solution based on the KVM technology, offering more features than Red Hat Enterprise Linux. Red Hat OpenStack Platform support OpenStack clouds.
KVM (Kernel-based Virtual Machine)
KVM is a hypervisor that only runs on systems with hardware supporting virtualization extensions. It is built into the standard RHEL 7 kernel, and it can run Linux, Windows, Solaris and BSD guests. KVM is integrated with the Quick Emulator (QEMU). KVM is managed with the libvirt API. The virtual machines are executed as multi-threaded Linux processes controlled by tools built in top of libvirt. KVM supports overcommitting, kernel same-page merging (KSM), disk I/O throttling, automatic NUMA balancing and virtual CPU hot add.
Xen
Xen can do full virtualization on systems that support virtualization extensions, but can also work as hypervisor on the machines that don't.
Quick Emulator (QEMU)
QEMU is a hosted virtual machine monitor.
QEMU Guest Agent
The QEMU guest agent runs on the guest operating system and makes it possible for the host machine to issue commands to the guest operating system.
libvirt
libvirt is an Open Source framework providing a hypervisor-independent C-based API that can be used to manage KVM as well as Xen. libvirt's clients are the virtualization management tools, such as virsh or libvirtd. libvirt provides access to management of virtual machines on a host. The management tools do not need to be on the same physical machine as the machines on which the hosts are running. The API can be used to provision, create, modify, monitor, control, migrate and stop virtual machines. Resources such as CPUs, memory, storage, networking and non-uniform memory access (NUMA) partitions can be listed with libvirt. libvirt delegates management operations to the hypervisor, so only operations supported by the hypervisor can be performed with libvirt.
virsh, the primary command-line virtualization management tool, is baed on libvirt. libvirtd is a server-side daemon that runs on the virtualization host and performs management tasks for virtualized guests.
libvirtd
libvirtd is a server-side daemon that runs on the virtualization host and performs management tasks for virtualized guests. libvirtd interacts with the hypervisor via libvirt. The service is configured to automatically start at boot during libvirt package installation. The service's functions are:
- Start and stop guests.
- Manage storage for use by guests.
- Manages networking, specifically the virtual network switch.
- Migrates guests between virtualization host servers.
The associated configuration file is /etc/sysconfig/libvirtd.
libvirt client libraries and utilities connect to this daemon to issue tasks and collect information about configuration and resources of the host system and guests. The daemon may optionally wisent on a TCP port. Restarting libvirtd does not impact the running guests. Guests continue to operate and will be picked up automatically after restart if their XML configuration has been defined.
libvirt-guests
This is a service that automatically saves guests to the disk when the host shuts down, and restores them to their pre-shutdown state when the host reboots. Note that this service is not automatically enabled, it must be enabled explicitly.
The associated configuration file is /etc/sysconfig/libvirt-guests.
virtio
virtio is a package that provides KVM hypervisor-specific code, and exposes paravirtualized devices to the guest operating system. virtio is a layer sitting between the hypervisor and the guest. All virtio devices have two parts: the host device and the guest driver. The paravirtualized device drivers allow the guest operating system to access the corresponding physical device installed on the host system, and they must be installed on the guest operating system. Examples of paravirtualized devices: the paravirtualized network device virtio-net, the paravirtualized block device (virtio-blk), which is a high-performance virtual storage device supported by the hypervisor, the paravirtualized controller device (virtio-scsi), and others (clock, virtio-serial, virtio-balloon, virtio-rng, QXL graphic card)
Also see paravirtualization and paravirtualized devices.
KVM Virtual Machine
A KVM guest has two components: the VM definition, expressed in XML, and the VM image, usually maintained on a storage volume in one of the formats described below. The virtual machine, and implicitly its definition, can be created on command-line with virt-install. The VM definition can then be edited with virsh edit and exported as XML with virsh dumpxml. An XML definition file can be used to create a virtual machine with virsh define.
Guests can be started with virsh start and stopped with virsh shutdown.
KVM Virtual Machine States
- running - The domain is currently running on a CPU.
- idle - The domain is idle, and not running or runnable. This can be caused because the domain is waiting on IO (a traditional wait state) or has gone to sleep because there was nothing else for it to do.
- paused - The domain has been paused, usually occurring through the administrator running virsh suspend. When in a paused state the domain will still consume allocated resources like memory, but will not be eligible for scheduling by the hypervisor.
- shutdown - The domain is in the process of shutting down, i.e. the guest operating system has been notified and should be in the process of stopping its operations gracefully.
- shut off - The domain is not running. Usually this indicates the domain has been shut down completely, or has not been started.
- crashed - The domain has crashed, which is always a violent ending. Usually this state can only occur if the domain has been configured not to restart on crash.
- dying - The domain is in process of dying, but hasn't completely shutdown or crashed.
- pmsuspended - The domain has been suspended by guest power management, e.g. entered into s3 state.
The state of a virtual machine is reported by virsh domstate.
KVM Virtual Machine XML Definition
The XML definition of an KVM guest can be generated with virsh dumpxml. An example of a virtual machine XML definition is available here:
KVM Virtual Machine Snapshot
The snapshot of a virtual machine consists in a copy of its XML definition and snapshots of all of its attached disk images, so the virtual machine can be recreated in that specific state. The snapshot can be manual, by saving the XML configuration of the guest and the state of its storage, or it can involve the virsh snapshot functionality, in which case formal snapshots are recorded as part of the state of the virtualization host, and can possibly include the state of the guest memory. Both procedures are described below:
Autostart
A guest virtual machine can be configured to start automatically during the host physical machine system's boot phase. Configuring autostart (both on and off) can be performed with virsh autostart
- Virtual Function I/O (VFIO) is a kernel driver that provides virtual machines with high performance access to physical hardware. It attaches PCI devices on the host system directly to the virtual machine, and it enables the PCI device to appear and behave as it was physically attached to the guest. The driver moves device assignment out of the KVM hypervisor and enforces device isolation at kernel level. This is the default device assignment mechanism in RHEL 7. For more details see RHEL 7 Virtualization Administration Guide - PCI passthrough.
- USB, PCI and SCSI Passthrough. RHEL 7 Virtualization Administration Guide - Guest virtual machine device configuration.
- Single Root I/O Virtualization (SR-IOV).
- N_Port ID Virtualization (NPIV). RHEL 7 Virtualization Administration Guide - Using an NPIV virtual adapter (VHBA) with SCSI devices.
Also see physically shared devices.
Storage and Virtualization
Virtualization Host Storage
Storage Pool
A storage pool is a quantity of storage set aside on the virtualization host for use by guest virtual machines. The storage pool is divided into storage volumes which are assigned to the virtual machines as block devices.
In case the pool's underlying storage is directly attached to the host physical machine, the pool is referred to as local storage pool. Local storage pools include local directories, directly attached disks, physical partitions and logical volume management (LVM) volumes. Local storage pools do not support live migration. It is also possible for the storage not to be physically attached to the virtualization host, but accessible over the network. This is the case for networked (shared) storage pools. Networked storage pools consists of storage devices shared over a network over standard protocols, such as NFS, iSCSI, GFS2, etc.
The storage pool is managed by libvirt on the virtualization host. The size of the storage pool may not exceed the size of the underlying physical storage. Storage pools store virtual machine images or are attached to virtual machines as additional storage. Multiple guests can share the same storage pool.
If not configured otherwise, libvirt uses a directory-based storage pool located in /var/lib/libvirt/images/.
The list of storage pools on the virtualization hosts can be obtained with:
virsh pool-list
Additional information about a storage pool status and usage can be obtained with:
virsh pool-info
Detailed information about a storage pool (such as the device on which it resides, the mount point on the virtualization host filesystem, size, etc.) can be obtained with:
virsh pool-dumpxml
The 'pool-dumpxml' command returns the block device on which the storage pool resides, and 'df' statistics provide information on how much space is left on the device and whether or not the storage pool can be expanded or not. For example, 'virsh pool-dumpxml' reports that 'main-storage-pool' resides on /dev/sda7 and 'df' reports that 29% of the device is not used:
/dev/sda7 875G 620G 255G 71% /main-storage-pool
Storage pools can be configured as follows:
Storage Volume
A storage pool is divided into storage volumes, which are abstractions of physical partitions, LVM logical volumes, file-based disk images and other storage types handled by libvirt. Storage volumes are presented to virtual machines as local storage devices, regardless of the underlying hardware. Once defined, the storage volumes' paths can be declared into the storage section of the guest virtual machine XML definition.
A storage volume is uniquely identified by a volume key. The format of the volume key depends upon the storage used. When used with block based storage, such as LVM, the volume key may be a unique string of alphanumeric characters. When used with file based storage, the key may be the full path to the volume storage.
There are three ways to refer a specific volume:
- Using the name of the volume and the storage pool.
- Using the full path to the storage on the host physical machine file system.
- Using an unique volume key.
Note that storage pools and volumes are optional, guest virtual machines may operate without them, but if that is the case, system administrators must ensure availability of the guest virtual machine's storage using alternate tools. Storage pools and volumes provide a way for libvirt to insure that a particular piece of storage will be available for a guest virtual machine.
Storage Volume Operations
Disk Image File Formats
qcow2
qcow is a file format for disk image files used by QEMU. It stands for "QEMU Copy on Write". It is a representation of a fixed size block device in a file. It uses a disk storage optimization strategy that delays allocation of storage unit it is actually needed. Files in qcow format can contain a variety of disk images which are generally associated with specific guest operating system (see "Virtual Machine Image Formats" below). qcow2 is an updated version of the qcow format, intended to supersede it. The main difference is that qcow2 supports multiple virtual machine snapshots.
qcow offers backing files, copy-on-write support where the image only represents changes made to the underlying disk image, snapshot support, where the image can contain multiple snapshots of the image history, compression and encryption. They can be used to instantiate virtual machines from template images. qcow2 files are typically more efficient to transfer over the network, because only sectors written by the virtual machine are allocated in the image.
The qcow format implies a base image (also called a backing file) and multiple disposable copy-on-write overlay disk images, that are laid in top of the base image. This model is useful in development and test environments when one could quickly revert to a known state and discard the overlay.
raw
It is also known as QEMU raw format. It contains the content of the disk with no additional metadata. Raw files can be pre-allocated or sparse. Sparse files allocate host disk space on demand, and are therefore form of thin provisioning. Pre-allocated files are fully provisioned but have higher performance than sparse files. Raw files are desirable when disk I/O performance is critical and transferring the image file over a network is rarely necessary.
Virtual Machine Image
The virtual machine image file can only be stored on a host file system, such as xfs, ext4 or NFS.
Virtual machine images can be manipulated with qemu-img.
Virtual Machine Image Formats
Disk Image Snapshots
Disk image snapshot definition.
Snapshots of a disk image can be taken with qemu-img snapshot command. When taking snapshots, it is recommended to apply the following snapshot tag name conventions.
Virtual Machine Snapshot
A Virtual Machine snapshot is a snapshot of a complete virtual machine including CPU state, RAM, device state and the content of all writable disks. VM snapshots can be done if there's is at least one non-removable and writable block device using the qcow2 disk image format.
Guest-Side Storage
The storage for virtual machines is abstracted from the physical storage attached to the host. The storage is attached to the virtual machine using paravirtualized or emulated block device drivers, deployed within the virtual machine. Commonly used storage devices:
- virtio-blk is a paravirtualized storage device which provides the best I/O performance, but has fewer features then virtio-scsi.
- virtio-scsi is a paravirtualized storage device for guests using large number of disks or advanced features.
- IDE emulated device. IDE performance is lower than virtio-blk or virtio-scsi but is widely compatible with many different systems. Also see emulated devices.
- CD-ROM device. ATAPI CD-ROM is an emulated device. virtio-scsi CD-ROM is also available.
- USB mass storage devicesAlso see emulated devices.
Naming Conventions
Networking
Security and Virtualization
KVM virtual machines use SELinux and sVirt to enforce security.
sVirt
sVirt is a technology included in RHEL 7 to integrate SELinux and virtualization. It applies Mandatory Access Control (MAC) to improve security when using virtual machines.
Steal Time
"Steal time" is the percentage of time a virtual CPU waits for real CPU while the hypervisor is servicing another virtual processor.
A high value means the physical CPU is overcommitted and more physical CPU should be allocated to the environment - or the VM should be moved on a different physical server. The steal time is reported by vmstat st.