Terraform Concepts: Difference between revisions
Line 275: | Line 275: | ||
===Remote Backend=== | ===Remote Backend=== | ||
Remote backends are designed to maintain and share [[#Remote_State|remote state]]. | Remote backends are designed to maintain and share [[#Remote_State|remote state]]. With a fully-featured remote state backend, Terraform can use remote locking as a measure to avoid two or more different users accidentally running Terraform at the same time, and thus ensure that each Terraform run begins with the most recent updates. | ||
Revision as of 23:53, 20 February 2020
Internal
Overview
Terraform is a tool for building, changing and managing infrastructure as code. It uses a configuration language named Hashicorp Configuration Language (HCL). Terraform is platform agnostic, and achieves that by using different provider APIs for resource provisioning, via plug-ins. A heterogenous environment can be managed with the same workflow.
Workflow
The typical Terraform workflow is:
- Scope - define what resources are needed.
- Author - create the configuration file in HCL.
- Initialize – run terraform init in the project directory with the config files. This will download the correct provider plugins.
- Plan and Apply - terraform plan (verification) and then terraform apply.
Configuration
The set of files used to describe infrastructure is known as Terraform configuration. Configuration files have a .tf extension.
"Configuration" is an important concept, and Hashicorp documentation refers to it repeatedly. A somewhat appropriate synonym for it would be "infrastructure project". Terraform was built to help manage and enact change. The configuration is changed locally and Terraform builds an execution plan that only modifies what is necessary to reach the desired state. Configuration and *state* can be version controlled. How? Changes in configuration are also “applied” with terraform apply.
Hashicorp Configuration Language (HCL)
HCL is human-readable. Configuration can also be JSON, but JSON is only recommended when the configuration is generated by a machine. Internally, the declarative language that drives provider API for resource provisioning. It contains support for input variables, output variables, etc. For more details, see:
Provider
A provider is responsible for creating and managing resources. Terraform uses provider plug-ins to translate its configuration into API instructions for the provider. A provider is specified in a "provider" block in a configuration file. Multiple provider blocks can exist in a Terraform configuration file.
Provider Plug-In
Provider-specific resources are managed with provider plugins. Each provider plugin is a an encapsulated binary, distributed separated by Terraform. They are downloaded by terraform init and stored in a subdirectory of the current working directory.
Available Providers
AWS
Kubernetes
Helm
Resource
A Terraform resource represents an actual resource that exists in the infrastructure. A resource can be a physical components, such an EC2 instance, or a logical resource, such an application. A Terraform resource has a type and a name. In a configuration file, a resource is described in a "resource" block.
The primary kind of resource, declared by a resource block, is known as a managed resource. A managed resource is different from a data resource, which provides read-only data exposed as a data source. Both kinds of resources take arguments and export attributes for use in configuration, but while managed resources cause Terraform to create, update, and delete infrastructure objects, data resources cause Terraform only to read objects. For brevity, managed resources are often referred to just as "resources" when the meaning is clear from context.
resource "resource-name" "local-name" { ... }
Resource Type
The resource type and name together serve as an identifier for a given resource and so must be unique within a module.
Resource Name
The resource name is used to refer to this resource from elsewhere in the same Terraform module, but has no significance outside of the scope of a module. The resource type and name together serve as an identifier for a given resource and so must be unique within a module.
Resource Syntax
Resource Dependencies
Resource parameters may use information from other resources. This relationship is expressed syntactically via an interpolation expression.
instance = aws_instance.example.id
If the resources are not dependent, they can be created in parallel, which will be done by Terraform whenever possible.
Implicit Dependency
Implicit dependencies via interpolation expressions are the primary way to inform Terraform about these relationships and should be used whenever possible.
Explicit Dependency
Explicit dependencies are expressed with “depends_on”. This is when the dependency is configured inside the application code, and it has to be explicitly mirrored in the infrastructure configuration.
depends_on = [aws_s3_bucket.example]
Tainted Resource
When provisioning fails, resources are marked as "tainted". Resources can be manually tainted with the “taint” command. This command does not modify infrastructure, but it modifies the state file to mark the resource as tainted – the next plan will show that the resource will be destroyed and recreated.
Data Source
A data source allows data to be fetched or computed for use in Terraform configuration, in a read-only manner, from a data resource. The underlying resource is queried, but not created, updated or destroyed, unlike in the managed resource case. Use of data sources allows a Terraform configuration to make use of information defined outside of Terraform, or defined by another separate Terraform configuration. A data source is accessed via a special kind of resource known as a data resource, declared with a data block.
data "data-source-name" "local-name" { ... }
The data block requests Terraform to read from a given data source ("aws_ami") and export the result under the given local name. The name is used to refer to this resource from elsewhere in the same Terraform module, but has no significance outside of the scope of a module. The data source and name together serve as an identifier for a given resource and so must be unique within a module.
Provisioning
In this context, provisioning means initialization of the resources created by the “apply” step by performing software provisioning. Another name for provisioning is instance initialization.
Provisioner
A provisioner uploads files, runs shell scripts, installs and trigger other software like configuration management tools. A provisioner is only run when the resource is created. The provisioner is declared inside a resource block with the “provisioner” keyword.
resource "aws_instance" "example" {
…
provisioner "local-exec" {
command = "echo ${aws_instance.example.public_ip} > ip_address.txt"
}
}
Multiple provisioner blocks can be added.
Failed Provisioner
If a resource is successfully created but fails during provisioning, it is marked as “tainted”.
Available Provisioners
- local-exec
- remote-exec (via ssh, specified with a “connection” keyword) https://learn.hashicorp.com/terraform/getting-started/provision#defining-a-provisioner
Module
A module is a self-contained package of Terraform configuration that is managed as a group. Modules are used to create reusable components, and treat pieces of infrastructure as a black box. There has been a change in semantics in Terraform 0.12. Modules can be nested to decompose complex systems into manageable components. A module may include automated tests, examples and documentation. A good module should raise the level of abstraction by describing a new concept in your architecture that is constructed from resource types offered by providers. Hashicorp documentation recommends against writing modules that are just thin wrappers around existing resources. If you have trouble finding a name for your module that isn't the same as the main resource type inside it, that may be a sign that your module is not creating any new abstraction and so the module is adding unnecessary complexity. Just use the resource type directly in the calling module instead.
Root Module
When terraform apply is executed, all .tf files in the working directory terraform is executed from form the root module. The root module may call other modules and connect them together by passing output values from one to input values of another. The .terraform directory is created by default in the root module directory by terraform init. The local state file terraform.state is also placed by default in the root module directory.
Using a Module
To call a module from its dependent module means to include the contents of that module into the configuration with specific values for its input variables. The intention to call (or use) a module is declared in a module
block, specified within the dependent module, which contains the source, a set of input values, which are listed in the module's "Inputs" documentation. The only required attribute is source
attribute, which tells Terraform where the dependency module can be retrieved. In is also highly recommended to specify the module's version
. Terraform automatically downloads and manages modules. Terraform can retrieve modules from a variety of sources, including the local filesystem, Terraform Registry, private module registries, Git and HTTP. For more details see Accessing a Remote Module below.
terraform {
required_version = "0.11.11"
}
provider "aws" {
...
}
module "consul" {
source = "hashicorp/consul/aws"
version = "0.7.3"
num_servers = "3"
}
Accessing a Remote Module
Terraform can retrieve modules from a variety of remote sources, including Terraform Registry, private module registries, GitHub, Git and HTTP.
GitHub
source = "github.com/hashicorp/terraform-aws-consul/modules/consul-cluster?ref=v0.7.3"
TODO This did not work, more research is needed. The error message was:
Error: Failed to download module Could not download module "test" (root.tf:2) source code from "github.com/hashicorp/terraform-aws-consul/modules/consul-cluster?ref=v0.7.3": subdir "modules/consul-cluster%253Fref=v0.7.3" not found
Module Examples (from Terraform Registry)
Using a Module
TODO
Module Versioning
Module Syntax
Module Initialization
If a module is referred in configuration, it is necessary to run - or re-run - terraform init, which obtains and installs the new module's source code.
Module Outputs
A module's outputs are values produced by the module: the ID of each resource it creates:
${module.module-name.output-name}
Module Destruction
All resources created by the module will be destroyed.
Writing a Module
Terraform Registry
Terraform Registry includes ready-to-use modules for various common purposes - they can serve as larger building-blocks for the infrastructure.
State
The normal Terraform workflow consists in reading configuration, which is essentially codified infrastructure in form of .tf files, and enacting the specification by instantiating or changing managed resources. Terraform modifies the state of the platform it acts upon. Normally, there should be no need to represent that state, as it would be reflected in the managed resources themselves. However, accessing resources to read their state every time that state is needed could be impractical and ineffective, especially when the size of the problem is large.
The solution Terraform came up with is to represent and cache the managed resources' state. This representation, which sits between configuration and the real world instantiation of that configuration is known as the Terraform state. The state is used by Terraform to map the real world resources to configuration and it has been proven to improve the performance for large infrastructures. Terraform uses this state to create plans prior to applying infrastructure changes and changing the state of the real infrastructure. The state can be explicitly synced with terraform refresh command.
From an implementation perspective, the state can be thought of as a database that maps configuration to actual managed resources by maintaining the association between the configuration name of the resource (aws_instance "my-instance") and real resource IDs, for example EC2 VM i-13df65f04f8d10cce. Alongside the mapping between configuration and remote objects, Terraform maintains in the state metadata such as resource dependencies. Normally, a dependency relationship is defined in configuration, but if the user modifies the configuration and deletes one of the ends of the relationship, the real world needs to be "adjusted" and relationship metadata in the state is the only piece of information left that reflect that reality until the reality is adjusted. To ensure correct operation, Terraform retains a copy of the most recent set of dependencies within the state. Now Terraform can still determine the correct order of destruction from the stat when the operator deletes one or more items from configuration. Terraform also stores other metadata, such as a pointer to the provider configuration that was most recently used with the resource in situations where multiple aliased providers are present.
In addition to basic mapping, Terraform stores a cache of the attribute values for all resources in the state. This is an optional feature and it is done only as a performance improvement, because for larger infrastructures, querying every resource every time it is needed is too slow. Many cloud providers do not provide APIs to query multiple resources at once, and the round trip time for each resource is hundreds of milliseconds. On top of this, cloud providers almost always have API rate limiting so Terraform can only request a certain number of resources in a period of time. In these scenarios, the cached state is treated as the record of truth.
When running a terraform plan, Terraform relies on its state to know the actual state of the resources involved in planning. By default, for every terraform plan and terraform apply, Terraform will sync all resources in the state.
Local State
By default, the state is stored locally in the root module directory, as a JSON file named terraform.tfstate.
Remote State
Alternatively, state could be stored remotely and shared on a remote backend. Remote state is useful when more than one operator is assumed to be interacting - concurrently - with the infrastructure built from shared configuration. In this situation it is important for everyone to be working with the same state so that operations will be applied to the same remote objects. Remote state is maintained within a remote backend.
Backend
Local Backend
.terraform Directory
.terraform directory is created by default in the root module by terraform init command and it has the follwing structure:
.terraform
└── plugins
└── darwin_amd64
├── lock.json
└── terraform-provider-aws_v2.49.0_x4
.terraform contains the plugins subdirectory.
.terraform.tfstate.lock.info
The file is created is created by default in the root module while an operation that modifies state is underway.
{
"ID":"121f201d-a963-35fd-5d93-9061a4168511",
"Operation":"OperationTypeApply",
"Info":"",
"Who":"ovidiufeodorov@Ovidiu-Feodorov.local",
"Version":"0.12.13",
"Created":"2020-02-20T02:38:55.865036Z",
"Path":"terraform.tfstate"
}
terraform.tfstate
The file is created is created by in the root module by default. It is a JSON file.
terraform.tfstate.backup
The file is created is created by in the root module by default.
Remote Backend
Remote backends are designed to maintain and share remote state. With a fully-featured remote state backend, Terraform can use remote locking as a measure to avoid two or more different users accidentally running Terraform at the same time, and thus ensure that each Terraform run begins with the most recent updates.
terraform.tfstate State File
A state file is created when the project is first initialized. It is maintained in the root of the project as terraform.tfstate
. State is used to create plans and manage changes to infrastructure. Prior to any operation, the state is refreshed from the real infrastructure – making the state the source of truth. The content of the file can be inspected with terraform show.
What about state.tfstate. Is this a standard file? What role does it play?
Backend
S3 Backend
Remote State
It is recommended to setup remote state.