Terraform Concepts: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 245: Line 245:


==Backend==
==Backend==
A backend is the maintainer of [[#State|state]].


===Local Backend===
===Local Backend===

Revision as of 00:15, 21 February 2020

Internal

Overview

Terraform is a tool for building, changing and managing infrastructure as code. It uses a configuration language named Hashicorp Configuration Language (HCL). Terraform is platform agnostic, and achieves that by using different provider APIs for resource provisioning, via plug-ins. A heterogenous environment can be managed with the same workflow.

Workflow

The typical Terraform workflow is:

  • Scope - define what resources are needed.
  • Author - create the configuration file in HCL.
  • Initialize – run terraform init in the project directory with the config files. This will download the correct provider plugins.
  • Plan and Apply - terraform plan (verification) and then terraform apply.

Configuration

The set of files used to describe infrastructure is known as Terraform configuration. Configuration files have a .tf extension.

"Configuration" is an important concept, and Hashicorp documentation refers to it repeatedly. A somewhat appropriate synonym for it would be "infrastructure project". Terraform was built to help manage and enact change. The configuration is changed locally and Terraform builds an execution plan that only modifies what is necessary to reach the desired state. Configuration and *state* can be version controlled. How? Changes in configuration are also “applied” with terraform apply.

Hashicorp Configuration Language (HCL)

HCL is human-readable. Configuration can also be JSON, but JSON is only recommended when the configuration is generated by a machine. Internally, the declarative language that drives provider API for resource provisioning. It contains support for input variables, output variables, etc. For more details, see:

Hashicorp Configuration Language

Provider

https://www.terraform.io/docs/providers/index.html

A provider is responsible for creating and managing resources. Terraform uses provider plug-ins to translate its configuration into API instructions for the provider. A provider is specified in a "provider" block in a configuration file. Multiple provider blocks can exist in a Terraform configuration file.

Provider Plug-In

Provider-specific resources are managed with provider plugins. Each provider plugin is a an encapsulated binary, distributed separated by Terraform. They are downloaded by terraform init and stored in a subdirectory of the current working directory.

Available Providers

AWS

Terraform AWS Provider

Kubernetes

https://www.terraform.io/docs/providers/kubernetes/index.html

Helm

https://www.terraform.io/docs/providers/helm/index.html

Resource

https://www.terraform.io/docs/configuration/resources.html

A Terraform resource represents an actual resource that exists in the infrastructure. A resource can be a physical components, such an EC2 instance, or a logical resource, such an application. A Terraform resource has a type and a name. In a configuration file, a resource is described in a "resource" block.

The primary kind of resource, declared by a resource block, is known as a managed resource. A managed resource is different from a data resource, which provides read-only data exposed as a data source. Both kinds of resources take arguments and export attributes for use in configuration, but while managed resources cause Terraform to create, update, and delete infrastructure objects, data resources cause Terraform only to read objects. For brevity, managed resources are often referred to just as "resources" when the meaning is clear from context.

resource "resource-name" "local-name" {
  ...
}

Resource Type

The resource type and name together serve as an identifier for a given resource and so must be unique within a module.

Resource Name

The resource name is used to refer to this resource from elsewhere in the same Terraform module, but has no significance outside of the scope of a module. The resource type and name together serve as an identifier for a given resource and so must be unique within a module.

Resource Syntax

Resource HCL Syntax Details

Resource Dependencies

https://learn.hashicorp.com/terraform/getting-started/dependencies

Resource parameters may use information from other resources. This relationship is expressed syntactically via an interpolation expression.

instance = aws_instance.example.id

If the resources are not dependent, they can be created in parallel, which will be done by Terraform whenever possible.

Implicit Dependency

Implicit dependencies via interpolation expressions are the primary way to inform Terraform about these relationships and should be used whenever possible.

Explicit Dependency

Explicit dependencies are expressed with “depends_on”. This is when the dependency is configured inside the application code, and it has to be explicitly mirrored in the infrastructure configuration.

depends_on = [aws_s3_bucket.example]

Tainted Resource

When provisioning fails, resources are marked as "tainted". Resources can be manually tainted with the “taint” command. This command does not modify infrastructure, but it modifies the state file to mark the resource as tainted – the next plan will show that the resource will be destroyed and recreated.

Data Source

https://www.terraform.io/docs/configuration/data-sources.html

A data source allows data to be fetched or computed for use in Terraform configuration, in a read-only manner, from a data resource. The underlying resource is queried, but not created, updated or destroyed, unlike in the managed resource case. Use of data sources allows a Terraform configuration to make use of information defined outside of Terraform, or defined by another separate Terraform configuration. A data source is accessed via a special kind of resource known as a data resource, declared with a data block.

data "data-source-name" "local-name" {
  ...
}

The data block requests Terraform to read from a given data source ("aws_ami") and export the result under the given local name. The name is used to refer to this resource from elsewhere in the same Terraform module, but has no significance outside of the scope of a module. The data source and name together serve as an identifier for a given resource and so must be unique within a module.

Provisioning

In this context, provisioning means initialization of the resources created by the “apply” step by performing software provisioning. Another name for provisioning is instance initialization.

Provisioner

https://www.terraform.io/docs/provisioners/index.html

A provisioner uploads files, runs shell scripts, installs and trigger other software like configuration management tools. A provisioner is only run when the resource is created. The provisioner is declared inside a resource block with the “provisioner” keyword.

resource "aws_instance" "example" {
  … 
  provisioner "local-exec" {
    command = "echo ${aws_instance.example.public_ip} > ip_address.txt"
  }
}

Multiple provisioner blocks can be added.

Failed Provisioner

https://learn.hashicorp.com/terraform/getting-started/provision#failed-provisioners-and-tainted-resources

If a resource is successfully created but fails during provisioning, it is marked as “tainted”.

Available Provisioners

Module

A module is a self-contained package of Terraform configuration that is managed as a group. Modules are used to create reusable components, and treat pieces of infrastructure as a black box. There has been a change in semantics in Terraform 0.12. Modules can be nested to decompose complex systems into manageable components. A module may include automated tests, examples and documentation. A good module should raise the level of abstraction by describing a new concept in your architecture that is constructed from resource types offered by providers. Hashicorp documentation recommends against writing modules that are just thin wrappers around existing resources. If you have trouble finding a name for your module that isn't the same as the main resource type inside it, that may be a sign that your module is not creating any new abstraction and so the module is adding unnecessary complexity. Just use the resource type directly in the calling module instead.

Root Module

When terraform apply is executed, all .tf files in the working directory terraform is executed from form the root module. The root module may call other modules and connect them together by passing output values from one to input values of another. The .terraform directory is created by default in the root module directory by terraform init. The local state file terraform.state is also placed by default in the root module directory.

Using a Module

https://www.terraform.io/docs/configuration/modules.html

To call a module from its dependent module means to include the contents of that module into the configuration with specific values for its input variables. The intention to call (or use) a module is declared in a module block, specified within the dependent module, which contains the source, a set of input values, which are listed in the module's "Inputs" documentation. The only required attribute is source attribute, which tells Terraform where the dependency module can be retrieved. In is also highly recommended to specify the module's version. Terraform automatically downloads and manages modules. Terraform can retrieve modules from a variety of sources, including the local filesystem, Terraform Registry, private module registries, Git and HTTP. For more details see Accessing a Remote Module below.

terraform {
  required_version = "0.11.11"
}

provider "aws" {
  ...
}

module "consul" {
  source      = "hashicorp/consul/aws"
  version     = "0.7.3"
  num_servers = "3"
}

Accessing a Remote Module

https://www.terraform.io/docs/modules/sources.html

Terraform can retrieve modules from a variety of remote sources, including Terraform Registry, private module registries, GitHub, Git and HTTP.

GitHub

source = "github.com/hashicorp/terraform-aws-consul/modules/consul-cluster?ref=v0.7.3"

TODO This did not work, more research is needed. The error message was:

Error: Failed to download module

Could not download module "test" (root.tf:2) source code from
"github.com/hashicorp/terraform-aws-consul/modules/consul-cluster?ref=v0.7.3":
subdir "modules/consul-cluster%253Fref=v0.7.3" not found

Module Examples (from Terraform Registry)

https://registry.terraform.io/modules/hashicorp/consul/aws/0.7.3
https://github.com/hashicorp/terraform-aws-consul

Using a Module

TODO

Module Versioning

Writing a Terraform Module - Module Versioning

Module Syntax

Module HCL Syntax Details

Module Initialization

If a module is referred in configuration, it is necessary to run - or re-run - terraform init, which obtains and installs the new module's source code.

Module Outputs

A module's outputs are values produced by the module: the ID of each resource it creates:

${module.module-name.output-name}

Module Destruction

All resources created by the module will be destroyed.

Writing a Module

Writing a Terraform Module

Terraform Registry

Terraform Registry includes ready-to-use modules for various common purposes - they can serve as larger building-blocks for the infrastructure.

https://registry.terraform.io/

State

https://www.terraform.io/docs/state/index.html

The normal Terraform workflow consists in reading configuration, which is essentially codified infrastructure in form of .tf files, and enacting the specification by instantiating or changing managed resources. Terraform modifies the state of the platform it acts upon. Normally, there should be no need to represent that state, as it would be reflected in the managed resources themselves. However, accessing resources to read their state every time that state is needed could be impractical and ineffective, especially when the size of the problem is large.

The solution Terraform came up with is to represent and cache the managed resources' state. This representation, which sits between configuration and the real world instantiation of that configuration is known as the Terraform state. The state is used by Terraform to map the real world resources to configuration and it has been proven to improve the performance for large infrastructures. Terraform uses this state to create plans prior to applying infrastructure changes and changing the state of the real infrastructure. The state can be explicitly synced with terraform refresh command.

From an implementation perspective, the state can be thought of as a database that maps configuration to actual managed resources by maintaining the association between the configuration name of the resource (aws_instance "my-instance") and real resource IDs, for example EC2 VM i-13df65f04f8d10cce. Alongside the mapping between configuration and remote objects, Terraform maintains in the state metadata such as resource dependencies. Normally, a dependency relationship is defined in configuration, but if the user modifies the configuration and deletes one of the ends of the relationship, the real world needs to be "adjusted" and relationship metadata in the state is the only piece of information left that reflect that reality until the reality is adjusted. To ensure correct operation, Terraform retains a copy of the most recent set of dependencies within the state. Now Terraform can still determine the correct order of destruction from the stat when the operator deletes one or more items from configuration. Terraform also stores other metadata, such as a pointer to the provider configuration that was most recently used with the resource in situations where multiple aliased providers are present.

In addition to basic mapping, Terraform stores a cache of the attribute values for all resources in the state. This is an optional feature and it is done only as a performance improvement, because for larger infrastructures, querying every resource every time it is needed is too slow. Many cloud providers do not provide APIs to query multiple resources at once, and the round trip time for each resource is hundreds of milliseconds. On top of this, cloud providers almost always have API rate limiting so Terraform can only request a certain number of resources in a period of time. In these scenarios, the cached state is treated as the record of truth.

State Format

The state is in JSON format. Terraform promises backward compatibility with the state file. It provides a "version" field on the state contents that allows the implementation to transparently move the format forward.

Sharing State

The state, as explained above, represents the source of truth for Terraform operations, so when more than one user concurrently interacts with a configuration and the set of resources provisioned from that configuration, it is important for everyone to be working with the same state so that operations will be applied to the same remote objects. In this situation, properly sharing state becomes important. Some backends - the maintainers of state - will lock the state for writing.

State Operations

When running a terraform plan, Terraform relies on its state to know the actual state of the resources involved in planning. By default, for every terraform plan and terraform apply, Terraform will sync all resources in the state. Even if it's JSON, direct file editing is discouraged.terraform state provides a CLI that allows basic modification. Terraform manages state backup automatically. In case state is maintained locally, it creates terraform.tfstate.backup.

Local State

By default, the state is stored locally in the root module directory, as a JSON file named terraform.tfstate.

Remote State

Alternatively, state could be stored remotely and shared on a remote backend.

Backend

A backend is the maintainer of state.

Local Backend

.terraform Directory

.terraform directory is created by default in the root module by terraform init command and it has the follwing structure:

.terraform
    └── plugins
        └── darwin_amd64
            ├── lock.json
            └── terraform-provider-aws_v2.49.0_x4

.terraform contains the plugins subdirectory.

.terraform.tfstate.lock.info

The file is created is created by default in the root module while an operation that modifies state is underway.

{
  "ID":"121f201d-a963-35fd-5d93-9061a4168511",
  "Operation":"OperationTypeApply",
  "Info":"",
  "Who":"ovidiufeodorov@Ovidiu-Feodorov.local",
  "Version":"0.12.13",
  "Created":"2020-02-20T02:38:55.865036Z",
  "Path":"terraform.tfstate"
}

terraform.tfstate

The file is created is created by in the root module by default. It is a JSON file.

terraform.tfstate.backup

The file is created is created by in the root module by default.

Remote Backend

Remote backends are designed to maintain and share remote state. With a fully-featured remote state backend, Terraform can use remote locking as a measure to avoid two or more different users accidentally running Terraform at the same time, and thus ensure that each Terraform run begins with the most recent updates.

State Import

https://www.terraform.io/docs/import/index.html

Terraform is able to import existing infrastructure, which allows resources created by some other means to be brought under Terraform management.

















terraform.tfstate State File

A state file is created when the project is first initialized. It is maintained in the root of the project as terraform.tfstate. State is used to create plans and manage changes to infrastructure. Prior to any operation, the state is refreshed from the real infrastructure – making the state the source of truth. The content of the file can be inspected with terraform show.

What about state.tfstate. Is this a standard file? What role does it play?

Backend

https://www.terraform.io/docs/backends/

S3 Backend

S3 Backend

Remote State

https://www.terraform.io/docs/state/remote.html

It is recommended to setup remote state.

State Locking

https://www.terraform.io/docs/state/locking.html

Workspace

https://www.terraform.io/docs/state/workspaces.html