Infrastructure Concepts: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(203 intermediate revisions by the same user not shown)
Line 4: Line 4:


=Internal=
=Internal=
* [[Infrastructure as Code#Subjects|Infrastructure as Code]]
* [[Infrastructure as Code Concepts|Infrastructure as Code Concepts]]
* [[Infrastructure as Code Concepts|Infrastructure as Code Concepts]]
* [[Continuous Delivery]]


=Overview=
=Overview=
In a cloud environment, infrastructure, as viewed by the user, is no longer represented by hardware, but by virtual constructs like servers, [[Amazon_VPC_Concepts#Subnet|subnets]] and block devices. The hardware still exists, but infrastructure elements accessible to users "float" across it, can be manipulated by the [[#Infrastructure_Platform|infrastructure platform]] APIs and can be created, duplicated, changed and destroyed at will. They are referred to as [[#Infrastructure_Resource|infrastructure resources]]. Infrastructure resources can be instantiated, changed and destroyed by [[Infrastructure as Code#Subjects|infrastructure as code]] to provide the infrastructure foundation for [[#Application_Runtime|application runtimes]] and [[#Applications|applications]].
In a cloud environment, infrastructure no longer means hardware, but by virtual constructs like servers, [[Amazon_VPC_Concepts#Subnet|subnets]] and block devices. The hardware still exists, but the infrastructure elements accessible to users "float" across it, can be manipulated by the [[#Infrastructure_Platform|infrastructure platform]] APIs and can be created, duplicated, changed and destroyed at will. They are referred to as [[#Infrastructure_Resource|infrastructure resources]]. Infrastructure resources can be instantiated, changed and destroyed by [[Infrastructure as Code#Subjects|infrastructure as code]] to provide the infrastructure foundation for [[#Application_Runtime|application runtimes]] and [[#Applications|applications]].
 
Infrastructure should be seen as a domain in its own right, the domain of building, delivering and running software, hence it should be subject to [[DDD|Domain-Driven Design]].
=Infrastructure Platform=
An infrastructure platform is a set of [[#Infrastructure_Resource|infrastructure resources]] and the tools and services that manage them. The infrastructure platform lets its users to provision compute, storage, networking and other infrastructure resources on which they can run operating systems and applications. Service like [[#Database|Databases]] can be categorized as [[#Application_Runtime_Service|application runtime services]] belonging to the [[#Application_Runtime|application runtime layer]], or as [[#Composite_Resources|composite resources]] exposed by the [[#Infrastructure_Platform|infrastructure platforms]]. The infrastructure resources are managed dynamically via an API, which is a definitory characteristic of a [[#Cloud|cloud]]. This is also the essential element that allows expressing the resources as [[Infrastructure_as_Code_Concepts#Overview|code]]. The infrastructure platform abstracts out the hardware and the virtualization layers. The service users do not control the underlying hardware resources except select networking configuration or perhaps the physical location of the resources at gross geographical level. On the other hand, they control, but they also must install and manage the operating system, the middleware and the application code. Infrastructure as a service is targeted at teams that are building out infrastructure. The infrastructure platforms are also known as '''Infrastructure as a Service''' (IaaS). Examples of public cloud platforms: AWS, Azure, GCP, Oracle Cloud. Example of private cloud platforms: OpenStack, VMware vCloud.
==<span id='Infrastructure_Resource'></span>Infrastructure Resources==
There are three essential resources provided by an infrastructure platform: [[#Compute|compute]], [[#Network|network]] and [[#Storage|storage]]. Some times these resources are referred to as "primitives". An [[#Infrastructure_Platform|infrastructure platform]] abstracts infrastructure resources from physical hardware. Infrastructure resources are assembled to provide [[#Application_Runtime|application runtime]] instances. Infrastructure resources can be expressed as [[Infrastructure_as_Code_Concepts#Infrastructure_Code|code]] and grouped in [[Infrastructure as Code Concepts#Stack|stacks]].
===Compute===
A compute resource executes code. In essence, any compute resource is backed by physical server CPU cores, but the infrastructure platform exposes then in more useful ways: [[#Physical_Server|physical servers]], [[#Virtual_Machine|virtual machines]], [[#Server_Cluster|server clusters]], [[#Container|containers]], [[#Application_Hosting_Cluster|application hosting clusters]] and [[#Serverless_Runtimes|serverless runtimes]].
====<span id='Physical_Server'></span>Physical Servers====
The infrastructure may provision and expose physical servers on demand. These are also called "bare metal".
====<span id='Virtual_Machine'></span>Virtual Machines====
The virtual machines are exposed by hypervisors that run in top of a pool of physical servers, managed by the infrastructure platform.
====<span id='Server_Cluster'></span>Server Clusters====
A server cluster is a pool of server instances, either [[#Virtual_Machine|virtual machines]] or [[#Physical_Server|physical servers]], that the infrastructure platform provisions and manages as a group. These are called [[Amazon_EC2_Auto-Scaling_Concepts#Auto-Scaling_Group|auto-scaling groups]] on AWS, [[Azure_Compute_Concepts#Azure_Virtual_Machine_Scale_Sets|Azure virtual machine scale sets]] on Azure and [[Google_Compute_Engine_Concepts#Managed_Instance_Groups_.28MIGs.29|Google managed instance groups]] on GCP.
====<span id='Container'></span>Containers====
Some infrastructure platforms offer Container as a Service infrastructure, which allows deploying and running container instances.
====<span id='Container_Cluster'></span><span id='Application_Hosting_Cluster'></span>Container Clusters====
A container cluster, also referred to as '''application hosting cluster''', is a pool of servers onto which the infrastructure platform deploys and manages multiple applications. This model separates the concerns of deploying and orchestrating applications from the concerns of provisioning and configuring the infrastructure in general, and servers they run on in particular.
 
Kief Morris argues against treating Kubernetes as a cloud abstraction layer. My own experience is less dire, it shows that one can actually do that:
 
::''Managed Kubernetes are not a cloud abstraction layer: At first glance, vendor-managed Kubernetes clusters appear to be a great solution for developing and running applications transparently across different cloud platforms. Now you can build applications for Kubernetes and run them on whatever cloud you like.  In practice, although an application cluster might be useful as one part of your application runtime layer, much more work is required to create a full runtime platform. And achieving true abstraction from the underlying cloud platforms is not trivial. Applications need access to other resources than the compute provided by the cluster, including storage and networking. There resources will be provided differently by different platforms, unless you build a solution to abstract those. You also need to provide services such as monitoring, identity management, and secrets management. Again, you will either use a different service on each cloud platform, or build a service abstraction layer that you can deploy and maintain on each cloud. So the application cluster is actually a small piece of your overall application hosting platform. And even that piece tends to vary from one cloud to the next, with different versions, implementations, and tooling for the core Kubernetes system.''
 
A container cluster should not be confused with a [[#PaaS|PaaS]]. The container cluster manages provisioning of compute resources for applications, which is one of the core functions of a PaaS, but a PaaS provides a variety of services beyond compute. Examples of container clusters are [[Amazon_ECS_Concepts#Overview|Amazon Elastic Container Service (ECS)]], [[Amazon_EKS_Concepts#Overview|Amazon Elastic Kubernetes Service (EKS)]], [[AKS_Concepts#Overview|Azure Kubernetes Service (AKS)]] and [[GKE Concepts#Overview|Google Kubernetes Engine (GKE)]]. A container cluster can be deployed and managed on an on-premises infrastructure platform using a supported Kubernetes release such as [[OpenShift]].
 
Cluster deployment tools: Helm, WeaveCloud, AWS ECS Services, Azure App Service Plans.
 
<font color=darkkhaki>CNAB Cloud Native Application Bundle.
 
HashiCorp Nomad, Apache Mesos.
</font>
 
Also see: {{Internal|Infrastructure_as_Code_Concepts#Container_Clusters_as_Code|Container Clusters as Code}}
 
====Application Servers====
Java-based application servers: Tomcat, Websphere, Weblogic, JBoss.
 
====Application Server Clusters====
A cluster of [[#Application_Servers|application servers]].
 
====<span id='Serverless_Runtimes'></span>Serverless Runtimes====
Serverless runtimes, also known as Function as a Service (FaaS) runtimes, execute code on demand, in response to events or schedules. Example: AWS Lambda, Azure Functions and Google Cloud Functions.
 
===Network===
An infrastructure platform provides the following network primitives: [[#Network_Address_Block|network address blocks]], [[#VLAN|VLANs]], [[#Route|routes]], etc.
====<span id='Network_Address_Block'></span>Network Address Blocks====
A network address block is a fundamental structure for grouping resources to control routing of traffic between them and isolate them from other resources that do not belong to the group. Network address blocks are known as [[Amazon_VPC_Concepts#Overview|VPCs]] in AWS and as virtual networks in Azure and GCP. The top level block is divided into smaller blocks known as [[#VLAN|VLANs]].
 
====<span id='VLAN'></span> VLANs====
VLANs are smaller sub-divisions of a [[#Network_Address_Block|network address block]]. They are known as [[Amazon_VPC_Concepts#Subnet|subnets]] in AWS.
====Names====
This includes DNS names, which are mapped onto IP addresses.
 
====<span id='Route'></span>Routes====
A route configures what traffic is allowed between and within address blocks.
====Gateways====
A gateway directs traffic in and out network address blocks.
 
====Load Balancing Rules====
Forward connections coming into a single address to a pool of resources.
====Proxies====
Accept connection and use rules to transform or route them.
====API Gateways====
Handle authentication and throttling.
====VPNs====
Connect different network address blocks across locations so they appear to be part of a single network.
====Direct Connections====
Cloud - Data Center network connections.
====Network Access Rules (Firewall Rules)====
====Asynchronous Messaging====
Queues for messages.
Also see: {{Internal|Asynchronous Communication|Asynchronous Communication}}
 
====Caches====
 
===Storage===
====Block Storage====
{{Internal|Storage_Concepts#Block_Storage|Block storage}}
====Object Storage====
{{Internal|Storage_Concepts#Object_Storage|Object storage}}
 
====Networked File System Storage====
{{Internal|Storage_Concepts#Networked_Filesystems|Networked Filesystem Storage}}
 
====Structured Data Storage====
 
===<span id='Composite_Resource'></span>Composite Resources===
Cloud platforms combined primitive infrastructure resources into composite resources. The line between a primitive resource and a composite resource is arbitrary, as is the line between a composite infrastructure resource and an [[#Application_Runtime_Service|application runtime service]]. Composite resources are some times referred to as "platform services".
====<span id='Database'></span>Database as a Service====
Most infrastructure platforms provide managed Database as a Service (DBaaS) that can be defined and managed as code. They may be standard commercial or open source database applications such as MySQL or PostgreSQL, [[NoSQL#Column_Stores|column stores]], [[NoSQL#Document_Databases|document databases]], [[NoSQL#Graph_Databases|graph databases]] or [[NoSQL#Distributed_Key-Value_Stores|distributed key-value stores]].
====Load Balancing====
====DNS====
====Identity Management====
====Secrets Management====
Storage for security-sensitive configuration such as passwords and keys.
 
==Infrastructure Services==
 
=<span id='PaaS'></span>Application Runtime=
The application runtime layer provides [[#Application_Runtime_Service|application runtime services]] and capabilities to the [[#Application|application]] layer. It consists of [[#Container_Cluster|container clusters]],  [[#Serverless|serverless execution environments]], application servers, messaging systems, databases and operating systems.  Services like [[#Database|databases]] can be considered application runtime services that belong to the application runtime layer, but at the same time, they can be seen as [[#Composite_Resource|composite resources]] exposed by the [[#Infrastructure_Platform|infrastructure platform]]. An application runtime is laid upon the infrastructure platform layer and it is assembled from [[#Infrastructure_Resource|infrastructure resources]]. The parts of the application runtime layer map to the parts of the infrastructure platform. These will include an execution environment based around compute resources, data management built on storage resources and connectivity composed of networking resources.
 
In theory, it could be useful to provide application runtimes as a complete set of services to developers, shielding them from the details of the underlying infrastructure. In practice, the lines are much fuzzier. Different teams need access to resources at different levels of abstractions, with different levels of control. It's in general a good idea to NOT implement systems with absolute boundaries (<font color=darkkhaki>what about layering and the [[Law of Demeter|Law of Demeter]] then?</font>), but instead define pieces that can be composed and presented in different ways to different users.
 
The Application Runtime layer is also referred to as '''Platform as a Service''' (PaaS) or "cloud application platform" and can be exposed directly to users, as it is the case for EKS, AKS, OpenShift, GKS, etc.
 
==Application Runtime Service==
The line between application runtime services, living in the application runtime layer, and [[#Composite_Resource|composite resources]], living in the infrastructure platform layer, is blurred. That is why application runtime services are some times referred to as "platform services".
==Typical Application Runtime Services==
===Service Discovery===
Application and services running in the application runtime need to know how to find other applications and services. The application runtime fulfills this need via DNS, resource tags, configuration registry, sidecar, API Gateway. Also see: {{Internal|Microservices#Service_Discovery|Microservices &#124; Service Discovery}}
 
===Monitoring===
===Log Management===
===Identity Management===
===Secrets Management===
 
==Application-Driven Infrastructure==
<font color=darkkhaki>TO CONTINUE.</font>
 
=Application=
=Application=
'''Applications''' and '''services''' provide domain-specific capabilities to organizations and users. Everything in the underlying layers ([[#Application_Runtime|application runtime]], [[#Infrastructure_Platform|infrastructure platform]]) exists to enable this layer.
Applications provide domain-specific capabilities to organizations and users. They exist in the form of application packages, container instances or serverless code. The underlying layers ([[#Application_Runtime|application runtime]], [[#Infrastructure_Platform|infrastructure platform]]) exist to enable this layer. Applications can be directly offered to users as part of a cloud service delivery model, under the generic name of '''Software-as-a-Service''' (SaaS). The users do not need to manage anything, but they also do not control anything, including the design of the application. This works well, unless the customer needs functionality that is not available in the application. Examples: Intuit Quickbooks, batch services based on Spark, aimed at data scientists, salesforce.com, etc.
==Deployable Parts of an Application==
===Executables===
The core of an application release is represented by executable code, packaged in binaries or libraries. The dependencies could be bundled with the application deployment (a container includes most of the operating system as well as the application that will run with it), or they can be provisioned as part of the infrastructure. Different target runtimes require different package formats.
===Configuration Defaults===


=Application Runtime=
===Data Structures===
The '''application runtime''' layer provides services and capabilities to the [[#Application|application]] layer. An application runtime is laid upon the [[#Infrastructure_Platform|infrastructure platform]] layer and it is assembled from [[#Infrastructure_Resource|infrastructure resources]]. The runtime instance may include [[#Server|servers]], [[#Cluster|clusters]] and [[#Serverless|serverless execution environments]].
When the application uses a database, the deployment may create or update schemas, including converting existing data when the schema changes. A given version of the schema usually corresponds to a version of the executable. Schema migration - updating data structures - should be a concern of the application deployment process, rather than the application runtime or the infrastructure platform's. There are tools to assist with this task: Flyway, DBDeploy, Liquibase, db-migrate. However, infrastructure and application runtime services need to support maintaining data when infrastructure or other underlying resources change or fail. See [[Infrastructure_as_Code_Concepts#State_Continuity|State Continuity]].


=Infrastructure Platform=
===Reference Data===
==<span id='Infrastructure_Resources'></span>Infrastructure Resources==
The initial set of data, which may change with the version.
A cloud infrastructure platform abstracts infrastructure resources (compute, network, storage) from physical hardware. Infrastructure resources are assembled to provide [[#Application_Runtime|application runtime]] instances.
===Connectivity===
This refers to inbound and outbound connectivity with dependencies. An application deployment may specify network configuration, such as network ports and elements used to support connectivity like certificate and keys. One way to do it is to define addressing, routing, naming and firewall rules as part of an [[Infrastructure_as_Code_Concepts#Stack_Project|infrastructure stack project]], and then deploy the application into the resulting infrastructure.


==Infrastructure Services==
===Application Configuration===


==Infrastructure Stack==
==Application Deployment is Not Infrastructure Management==
See: {{Internal|Infrastructure_as_Code_Concepts#Using_Infrastructure_Code_to_Deploy_Applications|Infrastructure as Code Concepts &#124; Using Infrastructure Code to Deploy Applications}}


<font color=darkkhaki>
=Cloud=
* Stack integration point.
{{External|https://www.nist.gov/programs-projects/nist-cloud-computing-program-nccp}}
</font>
{{External|http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf}}
NIST cloud definition:
{{Note|Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics (On-demand self-service, Broad network access, Resource pooling, Rapid elasticity, Measured Service); three service models (Cloud Software as a Service (SaaS), Cloud Platform as a Service (PaaS), Cloud Infrastructure as a Service (IaaS)); and, four deployment models (Private cloud, Community cloud, Public cloud, Hybrid cloud). Key enabling technologies include: (1) fast wide-area networks, (2) powerful, inexpensive server computers, and (3) high-performance virtualization for commodity hardware.}}
==Hybrid Cloud==
Hybrid cloud means hosting applications and services for a system across both private infrastructure and a public cloud service. This is required for legacy services that can't be migrated to public cloud or for legal reasons, where data must reside in countries the public cloud provider does not have a presence in.
==Cloud Agnostic==
Systems that can run on multiple public cloud platforms. This is done to avoid lock-in to one vendor, but in practice this results in lock-in to software that promises to hide difference between clouds.
==Polycloud==
Running an application or service on more than one public cloud platform.


=Environment=
=Environment=


=Cloud=
<font color=darkkhaki>NIST definition: https://www.nist.gov/programs-projects/nist-cloud-computing-program-nccp</font>
=<span id='Container'>Containers=
=<span id='Container'>Containers=
=<span id='Clusters'></span>Cluster=
=<span id='Clusters'></span>Cluster=
Line 36: Line 167:
Cluster as code.
Cluster as code.
</font>
</font>
* [[#Server_Cluster|Server cluster]]
* [[#Container_Cluster|Container cluster]]
* [[#Application_Server_Clusters|Application server cluster]]


=<span id='Servers'></span>Server=
=<span id='Servers'></span>Server=
=<span id='Serverless'></span>Serverless Execution Environment=
=<span id='Serverless'></span>Serverless Execution Environment=
=Configuration=
Also see: {{Internal|Infrastructure_as_Code_Concepts#Stack_Configuration|Infrastructure as Code Concepts &#124; Stack Configuration}}


=Configuration Drift=
==Configuration Drift==
Configuration drift is variation that happens over time across systems that were once identical. Manually making changes in configuration (performance optimizations, permissions, fixes), even if the base was laid down  by automation, causes configuration drift. Selectively using automation on some of the initially identical systems, but not on others, also causes configuration drift. This is how [[Infrastructure_as_Code_Concepts#Snowflake_System|snowflake systems]] come into existence. Also see [[Infrastructure_as_Code_Concepts#Minimize_Variation|Minimize Variation]]. Once manually-introduced configuration drift occurs, the trust in automation goes down, because people are not sure how an automation will modify a manually-changed system. Interestingly, manual configuration creeps in because the automation is not run frequently and consistently, leading to a vicious circle. To avoid this spiral, [[Infrastructure_as_Code_Concepts#Make_Everything_Reproducible|make everything reproducible]] automatically and consistently run automation. Operational automation combined with good [[Monitoring|monitoring]] exposes configuration drift.
Configuration drift is variation that happens over time across systems that were once identical. Manually making changes in configuration (performance optimizations, permissions, fixes), even if the base was laid down  by automation, causes configuration drift. Selectively using automation on some of the initially identical systems, but not on others, also causes configuration drift. This is how [[Infrastructure_as_Code_Concepts#Snowflake_System|snowflake systems]] come into existence. Also see [[Infrastructure_as_Code_Concepts#Minimize_Variation|Minimize Variation]]. Once manually-introduced configuration drift occurs, the trust in automation goes down, because people are not sure how an automation will modify a manually-changed system. Interestingly, manual configuration creeps in because the automation is not run frequently and consistently, leading to a vicious circle. To avoid this spiral, [[Infrastructure_as_Code_Concepts#Make_Everything_Reproducible|make everything reproducible]] automatically and consistently run automation. Operational automation combined with good [[Monitoring|monitoring]] exposes configuration drift.
===Preventing Configuration Drift===
There are several things that can be done to avoid configuration drift:
* Minimize automation lag. Automation lag is the time that passes between instance of running an automated process. The longer it's been since the last time the process ran, the more likely it will fail. This is not necessarily because the code changed, but because other part of a system, such a dependency changed, or someone may have made a manual fix or improvement and they neglected to fold it back into the code. The more frequently you apply infrastructure code, the less likely it is to fail.
* Avoid ad-hoc apply. Use automation not only to create new infrastructure, but also to update existing infrastructure.
* Apply code continuously. Even if nothing changed.
* Use immutable infrastructure. Rather than applying configuration code frequently to an infrastructure instance, apply it only once, when you create the instance. When the configuration changes, make a new instance and swap it out for the old one.
==Configuration Registry==
A configuration registry is a service that stores configuration values that may be used for many purposes, including service discovery, [[Infrastructure_as_Code_Concepts#Integration_Registry_Lookup|stack integration]], etc. The configuration registry can be used to provide [[Infrastructure_as_Code_Concepts#Parameter_Registry|stack configuration values]] when stacks are being instantiated. Using a configuration registry separates configuration from implementation. Parameters in the registry can be set, used and view by other tools. The configuration registry can act as a Configuration Management Database (CMDB), a source fo truth for the system configuration.
If the configuration registry properly protects security-sensitive information, this is a serious advantage because it eliminates the need for other service specialized in protecting secrets.
On the downside, the configuration registry becomes a dependency for other components of the system and possibly a single point of failure. Since such a component is required in disaster recovery, makes it a component of the critical path.
===Configuration Registry Implementations===
Infrastructure tool-specific configuration registries (may create lock-in and may not be open to interact with other tools):
* Ansible Tower
* Chef Infra Server
* PuppetDB
* Salt Mine
* [[AWS Systems Manager Parameter Store#Overview|AWS Systems Manager Parameter Store]]
* [[AWS Config]] (?)
General purpose tools that may be used as configuration registries are:
* [[etcd]]
* [[HashiCorp Consul]]
* [[Zookeeper]]
Configuration registries can be implemented using an existing file storage service like an object store, GitHub, a networked file server, a standard relational or document database or even a web server. These services can be quick to implement for a simple project, but when you get beyond trivial situations, you may find yourself building and maintaining functionality that you could get off the shelf.
===Combining Multiple Configuration Registries===
As various tools may come with their own, there could be situations when different parts of the configuration are maintained in different registries, each serving as source of truth for its own area.
==Configuration Hierarchy==
Most tools support a chain of configuration options with a predictable hierarchy of precedence (command line parameters take precedence over environment variables, which take precedence over configuration files).
==Security-Sensitive Configuration==
System need secrets to provision or operate. It is essential to store and handle secrets securely. First off, secrets should never be but in the code.
<font color=darkkhaki>TO PROCESS [[IaC|IaC]] Chapter 7. Configuring Stack Instances → Handling Secrets as Parameters</font>


=Governance=
=Governance=
==Lightweight Architectural Governance==
==Lightweight Architectural Governance==
Lightweight architectural governance aims to balance autonomy and centralized control. <font color=darkkhaki>More in [https://www.amazon.com/EDGE-Value-Driven-Transformation-Jim-Highsmith-ebook/dp/B07WFL74JR/ EDGE: Value-Driven Digital Transformation] by Jim Robert Highsmith, Linda Luu, David Robinson and the [https://conferences.oreilly.com/software-architecture/sa-ny-2019/public/schedule/detail/71911.html The Goldilocks zone of lightweight architectural governance] Jonny LeRoy talk.</font>
Lightweight architectural governance aims to balance autonomy and centralized control. <font color=darkkhaki>More in [https://www.amazon.com/EDGE-Value-Driven-Transformation-Jim-Highsmith-ebook/dp/B07WFL74JR/ EDGE: Value-Driven Digital Transformation] by Jim Robert Highsmith, Linda Luu, David Robinson and the [https://conferences.oreilly.com/software-architecture/sa-ny-2019/public/schedule/detail/71911.html The Goldilocks zone of lightweight architectural governance] Jonny LeRoy talk.</font>
=Security=
<font color=darkkhaki>
* [[IaC]] Chapter 3. Infrastructure Platform → Network Resources → Zero-trust security model with SDN.
</font>


=Organizatorium=
=Organizatorium=

Latest revision as of 16:32, 5 October 2023

External

Internal

Overview

In a cloud environment, infrastructure no longer means hardware, but by virtual constructs like servers, subnets and block devices. The hardware still exists, but the infrastructure elements accessible to users "float" across it, can be manipulated by the infrastructure platform APIs and can be created, duplicated, changed and destroyed at will. They are referred to as infrastructure resources. Infrastructure resources can be instantiated, changed and destroyed by infrastructure as code to provide the infrastructure foundation for application runtimes and applications.

Infrastructure should be seen as a domain in its own right, the domain of building, delivering and running software, hence it should be subject to Domain-Driven Design.

Infrastructure Platform

An infrastructure platform is a set of infrastructure resources and the tools and services that manage them. The infrastructure platform lets its users to provision compute, storage, networking and other infrastructure resources on which they can run operating systems and applications. Service like Databases can be categorized as application runtime services belonging to the application runtime layer, or as composite resources exposed by the infrastructure platforms. The infrastructure resources are managed dynamically via an API, which is a definitory characteristic of a cloud. This is also the essential element that allows expressing the resources as code. The infrastructure platform abstracts out the hardware and the virtualization layers. The service users do not control the underlying hardware resources except select networking configuration or perhaps the physical location of the resources at gross geographical level. On the other hand, they control, but they also must install and manage the operating system, the middleware and the application code. Infrastructure as a service is targeted at teams that are building out infrastructure. The infrastructure platforms are also known as Infrastructure as a Service (IaaS). Examples of public cloud platforms: AWS, Azure, GCP, Oracle Cloud. Example of private cloud platforms: OpenStack, VMware vCloud.

Infrastructure Resources

There are three essential resources provided by an infrastructure platform: compute, network and storage. Some times these resources are referred to as "primitives". An infrastructure platform abstracts infrastructure resources from physical hardware. Infrastructure resources are assembled to provide application runtime instances. Infrastructure resources can be expressed as code and grouped in stacks.

Compute

A compute resource executes code. In essence, any compute resource is backed by physical server CPU cores, but the infrastructure platform exposes then in more useful ways: physical servers, virtual machines, server clusters, containers, application hosting clusters and serverless runtimes.

Physical Servers

The infrastructure may provision and expose physical servers on demand. These are also called "bare metal".

Virtual Machines

The virtual machines are exposed by hypervisors that run in top of a pool of physical servers, managed by the infrastructure platform.

Server Clusters

A server cluster is a pool of server instances, either virtual machines or physical servers, that the infrastructure platform provisions and manages as a group. These are called auto-scaling groups on AWS, Azure virtual machine scale sets on Azure and Google managed instance groups on GCP.

Containers

Some infrastructure platforms offer Container as a Service infrastructure, which allows deploying and running container instances.

Container Clusters

A container cluster, also referred to as application hosting cluster, is a pool of servers onto which the infrastructure platform deploys and manages multiple applications. This model separates the concerns of deploying and orchestrating applications from the concerns of provisioning and configuring the infrastructure in general, and servers they run on in particular.

Kief Morris argues against treating Kubernetes as a cloud abstraction layer. My own experience is less dire, it shows that one can actually do that:

Managed Kubernetes are not a cloud abstraction layer: At first glance, vendor-managed Kubernetes clusters appear to be a great solution for developing and running applications transparently across different cloud platforms. Now you can build applications for Kubernetes and run them on whatever cloud you like. In practice, although an application cluster might be useful as one part of your application runtime layer, much more work is required to create a full runtime platform. And achieving true abstraction from the underlying cloud platforms is not trivial. Applications need access to other resources than the compute provided by the cluster, including storage and networking. There resources will be provided differently by different platforms, unless you build a solution to abstract those. You also need to provide services such as monitoring, identity management, and secrets management. Again, you will either use a different service on each cloud platform, or build a service abstraction layer that you can deploy and maintain on each cloud. So the application cluster is actually a small piece of your overall application hosting platform. And even that piece tends to vary from one cloud to the next, with different versions, implementations, and tooling for the core Kubernetes system.

A container cluster should not be confused with a PaaS. The container cluster manages provisioning of compute resources for applications, which is one of the core functions of a PaaS, but a PaaS provides a variety of services beyond compute. Examples of container clusters are Amazon Elastic Container Service (ECS), Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE). A container cluster can be deployed and managed on an on-premises infrastructure platform using a supported Kubernetes release such as OpenShift.

Cluster deployment tools: Helm, WeaveCloud, AWS ECS Services, Azure App Service Plans.

CNAB Cloud Native Application Bundle.

HashiCorp Nomad, Apache Mesos.

Also see:

Container Clusters as Code

Application Servers

Java-based application servers: Tomcat, Websphere, Weblogic, JBoss.

Application Server Clusters

A cluster of application servers.

Serverless Runtimes

Serverless runtimes, also known as Function as a Service (FaaS) runtimes, execute code on demand, in response to events or schedules. Example: AWS Lambda, Azure Functions and Google Cloud Functions.

Network

An infrastructure platform provides the following network primitives: network address blocks, VLANs, routes, etc.

Network Address Blocks

A network address block is a fundamental structure for grouping resources to control routing of traffic between them and isolate them from other resources that do not belong to the group. Network address blocks are known as VPCs in AWS and as virtual networks in Azure and GCP. The top level block is divided into smaller blocks known as VLANs.

VLANs

VLANs are smaller sub-divisions of a network address block. They are known as subnets in AWS.

Names

This includes DNS names, which are mapped onto IP addresses.

Routes

A route configures what traffic is allowed between and within address blocks.

Gateways

A gateway directs traffic in and out network address blocks.

Load Balancing Rules

Forward connections coming into a single address to a pool of resources.

Proxies

Accept connection and use rules to transform or route them.

API Gateways

Handle authentication and throttling.

VPNs

Connect different network address blocks across locations so they appear to be part of a single network.

Direct Connections

Cloud - Data Center network connections.

Network Access Rules (Firewall Rules)

Asynchronous Messaging

Queues for messages.

Also see:

Asynchronous Communication

Caches

Storage

Block Storage

Block storage

Object Storage

Object storage

Networked File System Storage

Networked Filesystem Storage

Structured Data Storage

Composite Resources

Cloud platforms combined primitive infrastructure resources into composite resources. The line between a primitive resource and a composite resource is arbitrary, as is the line between a composite infrastructure resource and an application runtime service. Composite resources are some times referred to as "platform services".

Database as a Service

Most infrastructure platforms provide managed Database as a Service (DBaaS) that can be defined and managed as code. They may be standard commercial or open source database applications such as MySQL or PostgreSQL, column stores, document databases, graph databases or distributed key-value stores.

Load Balancing

DNS

Identity Management

Secrets Management

Storage for security-sensitive configuration such as passwords and keys.

Infrastructure Services

Application Runtime

The application runtime layer provides application runtime services and capabilities to the application layer. It consists of container clusters, serverless execution environments, application servers, messaging systems, databases and operating systems. Services like databases can be considered application runtime services that belong to the application runtime layer, but at the same time, they can be seen as composite resources exposed by the infrastructure platform. An application runtime is laid upon the infrastructure platform layer and it is assembled from infrastructure resources. The parts of the application runtime layer map to the parts of the infrastructure platform. These will include an execution environment based around compute resources, data management built on storage resources and connectivity composed of networking resources.

In theory, it could be useful to provide application runtimes as a complete set of services to developers, shielding them from the details of the underlying infrastructure. In practice, the lines are much fuzzier. Different teams need access to resources at different levels of abstractions, with different levels of control. It's in general a good idea to NOT implement systems with absolute boundaries (what about layering and the Law of Demeter then?), but instead define pieces that can be composed and presented in different ways to different users.

The Application Runtime layer is also referred to as Platform as a Service (PaaS) or "cloud application platform" and can be exposed directly to users, as it is the case for EKS, AKS, OpenShift, GKS, etc.

Application Runtime Service

The line between application runtime services, living in the application runtime layer, and composite resources, living in the infrastructure platform layer, is blurred. That is why application runtime services are some times referred to as "platform services".

Typical Application Runtime Services

Service Discovery

Application and services running in the application runtime need to know how to find other applications and services. The application runtime fulfills this need via DNS, resource tags, configuration registry, sidecar, API Gateway. Also see:

Microservices | Service Discovery

Monitoring

Log Management

Identity Management

Secrets Management

Application-Driven Infrastructure

TO CONTINUE.

Application

Applications provide domain-specific capabilities to organizations and users. They exist in the form of application packages, container instances or serverless code. The underlying layers (application runtime, infrastructure platform) exist to enable this layer. Applications can be directly offered to users as part of a cloud service delivery model, under the generic name of Software-as-a-Service (SaaS). The users do not need to manage anything, but they also do not control anything, including the design of the application. This works well, unless the customer needs functionality that is not available in the application. Examples: Intuit Quickbooks, batch services based on Spark, aimed at data scientists, salesforce.com, etc.

Deployable Parts of an Application

Executables

The core of an application release is represented by executable code, packaged in binaries or libraries. The dependencies could be bundled with the application deployment (a container includes most of the operating system as well as the application that will run with it), or they can be provisioned as part of the infrastructure. Different target runtimes require different package formats.

Configuration Defaults

Data Structures

When the application uses a database, the deployment may create or update schemas, including converting existing data when the schema changes. A given version of the schema usually corresponds to a version of the executable. Schema migration - updating data structures - should be a concern of the application deployment process, rather than the application runtime or the infrastructure platform's. There are tools to assist with this task: Flyway, DBDeploy, Liquibase, db-migrate. However, infrastructure and application runtime services need to support maintaining data when infrastructure or other underlying resources change or fail. See State Continuity.

Reference Data

The initial set of data, which may change with the version.

Connectivity

This refers to inbound and outbound connectivity with dependencies. An application deployment may specify network configuration, such as network ports and elements used to support connectivity like certificate and keys. One way to do it is to define addressing, routing, naming and firewall rules as part of an infrastructure stack project, and then deploy the application into the resulting infrastructure.

Application Configuration

Application Deployment is Not Infrastructure Management

See:

Infrastructure as Code Concepts | Using Infrastructure Code to Deploy Applications

Cloud

https://www.nist.gov/programs-projects/nist-cloud-computing-program-nccp
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

NIST cloud definition:


Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics (On-demand self-service, Broad network access, Resource pooling, Rapid elasticity, Measured Service); three service models (Cloud Software as a Service (SaaS), Cloud Platform as a Service (PaaS), Cloud Infrastructure as a Service (IaaS)); and, four deployment models (Private cloud, Community cloud, Public cloud, Hybrid cloud). Key enabling technologies include: (1) fast wide-area networks, (2) powerful, inexpensive server computers, and (3) high-performance virtualization for commodity hardware.

Hybrid Cloud

Hybrid cloud means hosting applications and services for a system across both private infrastructure and a public cloud service. This is required for legacy services that can't be migrated to public cloud or for legal reasons, where data must reside in countries the public cloud provider does not have a presence in.

Cloud Agnostic

Systems that can run on multiple public cloud platforms. This is done to avoid lock-in to one vendor, but in practice this results in lock-in to software that promises to hide difference between clouds.

Polycloud

Running an application or service on more than one public cloud platform.

Environment

Containers

Cluster

Cluster as code.

Server

Serverless Execution Environment

Configuration

Also see:

Infrastructure as Code Concepts | Stack Configuration

Configuration Drift

Configuration drift is variation that happens over time across systems that were once identical. Manually making changes in configuration (performance optimizations, permissions, fixes), even if the base was laid down by automation, causes configuration drift. Selectively using automation on some of the initially identical systems, but not on others, also causes configuration drift. This is how snowflake systems come into existence. Also see Minimize Variation. Once manually-introduced configuration drift occurs, the trust in automation goes down, because people are not sure how an automation will modify a manually-changed system. Interestingly, manual configuration creeps in because the automation is not run frequently and consistently, leading to a vicious circle. To avoid this spiral, make everything reproducible automatically and consistently run automation. Operational automation combined with good monitoring exposes configuration drift.

Preventing Configuration Drift

There are several things that can be done to avoid configuration drift:

  • Minimize automation lag. Automation lag is the time that passes between instance of running an automated process. The longer it's been since the last time the process ran, the more likely it will fail. This is not necessarily because the code changed, but because other part of a system, such a dependency changed, or someone may have made a manual fix or improvement and they neglected to fold it back into the code. The more frequently you apply infrastructure code, the less likely it is to fail.
  • Avoid ad-hoc apply. Use automation not only to create new infrastructure, but also to update existing infrastructure.
  • Apply code continuously. Even if nothing changed.
  • Use immutable infrastructure. Rather than applying configuration code frequently to an infrastructure instance, apply it only once, when you create the instance. When the configuration changes, make a new instance and swap it out for the old one.

Configuration Registry

A configuration registry is a service that stores configuration values that may be used for many purposes, including service discovery, stack integration, etc. The configuration registry can be used to provide stack configuration values when stacks are being instantiated. Using a configuration registry separates configuration from implementation. Parameters in the registry can be set, used and view by other tools. The configuration registry can act as a Configuration Management Database (CMDB), a source fo truth for the system configuration.

If the configuration registry properly protects security-sensitive information, this is a serious advantage because it eliminates the need for other service specialized in protecting secrets.

On the downside, the configuration registry becomes a dependency for other components of the system and possibly a single point of failure. Since such a component is required in disaster recovery, makes it a component of the critical path.

Configuration Registry Implementations

Infrastructure tool-specific configuration registries (may create lock-in and may not be open to interact with other tools):

General purpose tools that may be used as configuration registries are:

Configuration registries can be implemented using an existing file storage service like an object store, GitHub, a networked file server, a standard relational or document database or even a web server. These services can be quick to implement for a simple project, but when you get beyond trivial situations, you may find yourself building and maintaining functionality that you could get off the shelf.

Combining Multiple Configuration Registries

As various tools may come with their own, there could be situations when different parts of the configuration are maintained in different registries, each serving as source of truth for its own area.

Configuration Hierarchy

Most tools support a chain of configuration options with a predictable hierarchy of precedence (command line parameters take precedence over environment variables, which take precedence over configuration files).

Security-Sensitive Configuration

System need secrets to provision or operate. It is essential to store and handle secrets securely. First off, secrets should never be but in the code.

TO PROCESS IaC Chapter 7. Configuring Stack Instances → Handling Secrets as Parameters

Governance

Lightweight Architectural Governance

Lightweight architectural governance aims to balance autonomy and centralized control. More in EDGE: Value-Driven Digital Transformation by Jim Robert Highsmith, Linda Luu, David Robinson and the The Goldilocks zone of lightweight architectural governance Jonny LeRoy talk.

Security

  • IaC Chapter 3. Infrastructure Platform → Network Resources → Zero-trust security model with SDN.

Organizatorium

  • Integration points