Terraform Concepts

From NovaOrdis Knowledge Base
Revision as of 01:31, 9 January 2022 by Ovidiu (talk | contribs) (→‎Module)
Jump to navigation Jump to search

Internal

Overview

Terraform is a tool for building, changing and managing infrastructure as code. It uses a configuration language named Hashicorp Configuration Language (HCL). Terraform is platform agnostic, and achieves that by using different provider APIs for resource provisioning, via plug-ins. A heterogenous environment can be managed with the same workflow. Terraform is one of the tools that can be used to manage generic Infrastructure as Code stacks.

Workflow

The typical Terraform workflow is:

  • Scope - define what resources are needed.
  • Author - create the configuration file in HCL.
  • Initialize – run terraform init in the project directory with the config files. This will download the correct provider plugins.
  • Plan and Apply - terraform plan (verification) and then terraform apply.

Configuration

The set of files used to describe infrastructure is known as Terraform configuration. Configuration files have a .tf extension.

"Configuration" is an important concept, and Hashicorp documentation refers to it repeatedly. A somewhat appropriate synonym for it would be "infrastructure project". Terraform was built to help manage and enact change. The configuration is changed locally and Terraform builds an execution plan that only modifies what is necessary to reach the desired state. Configuration and *state* can be version controlled. How? Changes in configuration are also “applied” with terraform apply.

Hashicorp Configuration Language (HCL)

HCL is human-readable. Configuration can also be JSON, but JSON is only recommended when the configuration is generated by a machine. Internally, the declarative language that drives provider API for resource provisioning. It contains support for input variables, output variables, etc. For more details, see:

Hashicorp Configuration Language

Provider

https://www.terraform.io/docs/providers/index.html

A provider is responsible for creating and managing resources. Terraform uses provider plug-ins to translate its configuration into API instructions for the provider. A provider is specified in a "provider" block in a configuration file. Multiple provider blocks can exist in a Terraform configuration file.

Provider Plug-In

Provider-specific resources are managed with provider plugins. Each provider plugin is a an encapsulated binary, distributed separated by Terraform. They are downloaded by terraform init and stored in a subdirectory of the current working directory.

Available Providers

AWS

Terraform AWS Provider

Helm

Terraform Helm Provider

Kubernetes

Terraform Kubernetes Provider

Resource

https://www.terraform.io/docs/configuration/resources.html

A Terraform resource represents an actual resource that exists in the infrastructure. A resource can be a physical components, such an EC2 instance, or a logical resource, such an application. A Terraform resource has a type and a name. In a configuration file, a resource is described in a "resource" block.

The primary kind of resource, declared by a resource block, is known as a managed resource. A managed resource is different from a data resource, which provides read-only data exposed as a data source. Both kinds of resources take arguments and export attributes for use in configuration, but while managed resources cause Terraform to create, update, and delete infrastructure objects, data resources cause Terraform only to read objects. For brevity, managed resources are often referred to just as "resources" when the meaning is clear from context.

resource "resource-name" "local-name" {
  ...
}

Resource Type

The resource type and name together serve as an identifier for a given resource and so must be unique within a module.

Resource Name

The resource name is used to refer to this resource from elsewhere in the same Terraform module, but has no significance outside of the scope of a module. The resource type and name together serve as an identifier for a given resource and so must be unique within a module.

Resource Syntax

Resource HCL Syntax Details

Resource Dependencies

https://learn.hashicorp.com/terraform/getting-started/dependencies

Resource parameters may use information from other resources. This relationship is expressed syntactically via an interpolation expression.

instance = aws_instance.example.id

If the resources are not dependent, they can be created in parallel, which will be done by Terraform whenever possible.

Implicit Dependency

Implicit dependencies via interpolation expressions are the primary way to inform Terraform about these relationships and should be used whenever possible.

Explicit Dependency

Explicit dependencies are expressed with “depends_on”. This is when the dependency is configured inside the application code, and it has to be explicitly mirrored in the infrastructure configuration.

depends_on = [aws_s3_bucket.example]

Tainted Resource

When provisioning fails, resources are marked as "tainted". Resources can be manually tainted with the “taint” command. This command does not modify infrastructure, but it modifies the state file to mark the resource as tainted – the next plan will show that the resource will be destroyed and recreated.

Data Source

https://www.terraform.io/docs/configuration/data-sources.html

A data source allows data to be fetched or computed for use in Terraform configuration, in a read-only manner, from a data resource. The underlying resource is queried, but not created, updated or destroyed, unlike in the managed resource case. Use of data sources allows a Terraform configuration to make use of information defined outside of Terraform, or defined by another separate Terraform configuration. A data source is accessed via a special kind of resource known as a data resource, declared with a data block.

data "data-source-name" "local-name" {
  ...
}

The data block requests Terraform to read from a given data source ("aws_ami") and export the result under the given local name. The name is used to refer to this resource from elsewhere in the same Terraform module, but has no significance outside of the scope of a module. The data source and name together serve as an identifier for a given resource and so must be unique within a module.

Provisioning

In this context, provisioning means initialization of the resources created by the “apply” step by performing software provisioning. Another name for provisioning is instance initialization.

Provisioner

https://www.terraform.io/docs/provisioners/index.html

A provisioner uploads files, runs shell scripts, installs and trigger other software like configuration management tools. A provisioner is only run when the resource is created. The provisioner is declared inside a resource block with the “provisioner” keyword.

resource "aws_instance" "example" {
  … 
  provisioner "local-exec" {
    command = "echo ${aws_instance.example.public_ip} > ip_address.txt"
  }
}

Multiple provisioner blocks can be added.

Failed Provisioner

https://learn.hashicorp.com/terraform/getting-started/provision#failed-provisioners-and-tainted-resources

If a resource is successfully created but fails during provisioning, it is marked as “tainted”.

Available Provisioners

Module

A module is a self-contained package of Terraform configuration that is managed as a group. Modules support modular infrastructure in Terraform. Modules are used to create reusable components, and treat pieces of infrastructure as a black box. There has been a change in semantics in Terraform 0.12. Modules can be nested to decompose complex systems into manageable components. A module may include automated tests, examples and documentation. A good module should raise the level of abstraction by describing a new concept in your architecture that is constructed from resource types offered by providers. Hashicorp documentation recommends against writing modules that are just thin wrappers around existing resources. If you have trouble finding a name for your module that isn't the same as the main resource type inside it, that may be a sign that your module is not creating any new abstraction and so the module is adding unnecessary complexity. Just use the resource type directly in the calling module instead.

Root Module

When terraform apply is executed, all .tf files in the working directory terraform is executed from form the root module. The root module may call other modules and connect them together by passing output values from one to input values of another. The .terraform directory is created by default in the root module directory by terraform init. The local state file terraform.state is also placed by default in the root module directory.

Using a Module

https://www.terraform.io/docs/configuration/modules.html

To call a module from its dependent module means to include the contents of that module into the configuration with specific values for its input variables. The intention to call (or use) a module is declared in a module block, specified within the dependent module, which contains the source, a set of input values, which are listed in the module's "Inputs" documentation. The only required attribute is source attribute, which tells Terraform where the dependency module can be retrieved. In is also highly recommended to specify the module's version. Terraform automatically downloads and manages modules. Terraform can retrieve modules from a variety of sources, including the local filesystem, Terraform Registry, private module registries, Git and HTTP. For more details see Accessing a Remote Module below.

terraform {
  required_version = "0.11.11"
}

provider "aws" {
  ...
}

module "consul" {
  source      = "hashicorp/consul/aws"
  version     = "0.7.3"
  num_servers = "3"
}

Module Sources

TODO: https://www.terraform.io/docs/modules/sources.html

Syntax:Module Block Syntax

Local Module

Remote Module

https://www.terraform.io/docs/modules/sources.html

Terraform can retrieve modules from a variety of remote sources, including Terraform Registry, private module registries, GitHub, Git and HTTP.

GitHub

source = "github.com/hashicorp/terraform-aws-consul/modules/consul-cluster?ref=v0.7.3"

TODO This did not work, more research is needed. The error message was:

Error: Failed to download module

Could not download module "test" (root.tf:2) source code from
"github.com/hashicorp/terraform-aws-consul/modules/consul-cluster?ref=v0.7.3":
subdir "modules/consul-cluster%253Fref=v0.7.3" not found

Module Examples (from Terraform Registry)

https://registry.terraform.io/modules/hashicorp/consul/aws/0.7.3
https://github.com/hashicorp/terraform-aws-consul

Module Versioning

Writing a Terraform Module - Module Versioning

Module Syntax

Module HCL Syntax Details

Module Initialization

If a module is referred in configuration, it is necessary to run - or re-run - terraform init, which obtains and installs the new module's source code.

Module Outputs

A module's outputs are values produced by the module: the ID of each resource it creates:

${module.module-name.output-name}

Module Local Values

A local value assign a name to an expression, allowing it to be used multiple times within the module without repeating it. For more details on locals, see:

Hashicorp Configuration Language - locals

Module Destruction

All resources created by the module will be destroyed.

Writing a Module

Writing a Terraform Module

Terraform Registry

Terraform Registry includes ready-to-use modules for various common purposes - they can serve as larger building-blocks for the infrastructure.

https://registry.terraform.io/

Modules exposed in the Terraform Registry can be specified using their relative path to the registry root URL. For more details, see:

Terraform Module Block - How to Specify a Remote Module

State

https://www.terraform.io/docs/state/index.html

The normal Terraform workflow consists in reading configuration, which is essentially codified infrastructure in form of .tf files, and enacting the specification by instantiating or changing managed resources. Terraform modifies the state of the platform it acts upon. Normally, there should be no need to represent that state, as it would be reflected in the managed resources themselves. However, accessing resources to read their state every time that state is needed could be impractical and ineffective, especially when the size of the problem is large.

The solution Terraform came up with is to represent and cache the managed resources' state. This representation, which sits between configuration and the real world instantiation of that configuration is known as the Terraform state. The state is used by Terraform to map the real world resources to configuration and it has been proven to improve the performance for large infrastructures. Terraform uses this state to create plans prior to applying infrastructure changes and changing the state of the real infrastructure. The state can be explicitly synced with terraform refresh command.

From an implementation perspective, the state can be thought of as a database that maps configuration to actual managed resources by maintaining the association between the configuration name of the resource (aws_instance "my-instance") and real resource IDs, for example EC2 VM i-13df65f04f8d10cce. Alongside the mapping between configuration and remote objects, Terraform maintains in the state metadata such as resource dependencies. Normally, a dependency relationship is defined in configuration, but if the user modifies the configuration and deletes one of the ends of the relationship, the real world needs to be "adjusted" and relationship metadata in the state is the only piece of information left that reflect that reality until the reality is adjusted. To ensure correct operation, Terraform retains a copy of the most recent set of dependencies within the state. Now Terraform can still determine the correct order of destruction from the stat when the operator deletes one or more items from configuration. Terraform also stores other metadata, such as a pointer to the provider configuration that was most recently used with the resource in situations where multiple aliased providers are present.

In addition to basic mapping, Terraform stores a cache of the attribute values for all resources in the state. This is an optional feature and it is done only as a performance improvement, because for larger infrastructures, querying every resource every time it is needed is too slow. Many cloud providers do not provide APIs to query multiple resources at once, and the round trip time for each resource is hundreds of milliseconds. On top of this, cloud providers almost always have API rate limiting so Terraform can only request a certain number of resources in a period of time. In these scenarios, the cached state is treated as the record of truth.

State Format

The state is in JSON format. Terraform promises backward compatibility with the state file. It provides a "version" field on the state contents that allows the implementation to transparently move the format forward.

Sharing State

The state, as explained above, represents the source of truth for Terraform operations, so when more than one user concurrently interacts with a configuration and the set of resources provisioned from that configuration, it is important for everyone to be working with the same state so that operations will be applied to the same remote objects. In this situation, properly sharing state becomes important.

Locking

https://www.terraform.io/docs/state/locking.html

Some backends - the maintainers of state - will lock the state for writing. Locking prevents other users from writing concurrently and potentially corrupting the state. State locking happens automatically on all operations that could write state. If state locking fails, Terraform will not continue. State locking can be disabled for most commands with the -lock flag, but this is not recommended. Note that not all backends support locking.

State can be forcefully unlocked with terraform force-unlock.

State Operations

When running a terraform plan, Terraform relies on its state to know the actual state of the resources involved in planning. By default, for every terraform plan and terraform apply, Terraform will sync all resources in the state. terraform show will show the state. Even if it's JSON, direct file editing is discouraged, terraform state provides a CLI that allows basic modification. Terraform manages state backup automatically. In case state is maintained locally, it creates terraform.tfstate.backup.

Local State

By default, the state is stored locally in the root module directory, as a JSON file named terraform.tfstate.

Remote State

https://www.terraform.io/docs/state/remote.html

Alternatively, state could be stored remotely and shared on a remote backend, which can then be shared between all members of the team. Aside from sharing, remote state allows output delegation to other teams: infrastructure resources can be shared in a read-only way without relying on any additional configuration store. Syntactically this can be expressed with a terraform_remote_state data source.

When using remote state, state is only ever held in memory when used by Terraform, and it is not written on local storage. This approach has implications for security-sensitive data.

Backend

https://www.terraform.io/docs/backends/

A backend is the maintainer of state, the abstraction that determines how state is loaded and how an operation such as apply is executed. By default, Terraform uses the local backend. Various backends have different features: some backends can store the data remotely and protect that state with locks to prevent corruption. Some backends can keep sensitive information off disk, as state is retrieved from backend on demand and only stored in memory. Some advanced backends support remote operations, which enable the operation to execute on the backend, instead of the local machine.

Backend are configured in Terraform files in the terraform block. The backend configuration can be changed at any time: both the configuration itself as well as the type of the backend. Terraform will automatically detect any changes in configuration and request a reinitialization. As part of the reinitialization process, Terraform will ask if to migrate existing state to the new configuration.

Local Backend

TODO:

  • Even if everything read so far seems to indicate that by default a "local backend" is used, and the terraform.tfstate file is the local backend, there's no explicit "backend" specification anywhere in .terraform, so what is the difference between the default terraform.tfstate and a local backend?
  • Is .terraform directory part of the local backend or it is something else?

terraform.tfstate

This file contains the entire state of a specific configuration, in JSON format, and it can be backed up by simply making a copy of the file. The file is created in the root module directory by default when terraform apply is executed. The content of the state file can be shown with terraform show.

.terraform Directory

.terraform directory is created by default in the root module by terraform init command and it has the follwing structure:

.terraform
    │ 
    ├── plugins
    │    └── darwin_amd64
    │          ├── lock.json
    │          └── terraform-provider-aws_v2.49.0_x4
    │
    └── modules
         ├── dependency-module-1
         │      └── terraform-aws-modules-terraform-aws-eks-908c656
         │           ├── outputs.tf
         │           ├── variables.tf
         │           └── main.tf
         └── modules.json

.terraform contains the following:

  • the plugins subdirectory. Its content is updated when terraform init is executed.
  • the module subdirectory, where the dependency modules (and their modules, recursively) are cached. Its content is updated when terraform get and terraform init is executed.

.terraform.tfstate.lock.info

The file is created is created by default in the root module while an operation that modifies state is underway.

{
  "ID":"121f201d-a963-35fd-5d93-9061a4168511",
  "Operation":"OperationTypeApply",
  "Info":"",
  "Who":"ovidiufeodorov@Ovidiu-Feodorov.local",
  "Version":"0.12.13",
  "Created":"2020-02-20T02:38:55.865036Z",
  "Path":"terraform.tfstate"
}

terraform.tfstate.backup

The file is created is created by in the root module by default.

Remote Backend

Remote backends are designed to maintain and share remote state. With a fully-featured remote state backend, Terraform can use remote locking as a measure to avoid two or more different users accidentally running Terraform at the same time, and thus ensure that each Terraform run begins with the most recent updates.

Amazon S3 Remote Backend

Amazon S3 Remote Backend

Terraform Cloud Remote Backend

Terraform Cloud Remote Backend

State Import

https://www.terraform.io/docs/import/index.html

Terraform is able to import existing infrastructure, which allows resources created by some other means to be brought under Terraform management.

Security-Sensitive Data

Terraform state may contain sensitive data. Be aware that when using local state, state is stored in plain-text JSON file.

Workspaces

A workspace is a partition of a backend. Initially, any backend has only one workspace, called "default". Certain backends support multiple named workspaces, allowing multiple states - per backend- to be associated with a single configuration. The configuration still has only one backend, but multiple distinct instances of that configuration to be deployed without configuring a new backend or changing authentication credentials.

Backend that support multiple workspaces are Amazon S3 remote backend, the local backend, etc.

TODO: https://www.terraform.io/docs/state/workspaces.html