Terraform Concepts
Internal
Overview
Terraform is a tool for building, changing and managing infrastructure as code. It uses a configuration language named Hashicorp Configuration Language (HCL). Terraform is platform agnostic, and achieves that by using different provider APIs for resource provisioning, via plug-ins. A heterogenous environment can be managed with the same workflow.
Workflow
The typical Terraform workflow is:
- Scope - define what resources are needed.
- Author - create the configuration file in HCL.
- Initialize – run terraform init in the project directory with the config files. This will download the correct provider plugins.
- Plan and Apply - terraform plan (verification) and then terraform apply.
Configuration
The set of files used to describe infrastructure is known as Terraform configuration. Configuration files have a .tf extension.
"Configuration" is an important concept, and Hashicorp documentation refers to it repeatedly. A somewhat appropriate synonym for it would be "infrastructure project". Terraform was built to help manage and enact change. The configuration is changed locally and Terraform builds an execution plan that only modifies what is necessary to reach the desired state. Configuration and *state* can be version controlled. How? Changes in configuration are also “applied” with terraform apply.
Hashicorp Configuration Language (HCL)
HCL is human-readable. Configuration can also be JSON, but JSON is only recommended when the configuration is generated by a machine. Internally, the declarative language that drives provider API for resource provisioning. It contains support for input variables, output variables, etc. For more details, see:
Provider
A provider is responsible for creating and managing resources. Terraform uses provider plug-ins to translate its configuration into API instructions for the provider. A provider is specified in a "provider" block in a configuration file. Multiple provider blocks can exist in a Terraform configuration file.
Provider Plug-In
Provider-specific resources are managed with provider plugins. Each provider plugin is a an encapsulated binary, distributed separated by Terraform. They are downloaded by terraform init and stored in a subdirectory of the current working directory.
Available Providers
AWS
Kubernetes
Helm
Resource
A Terraform resource represents an actual resource that exists in the infrastructure. A resource can be a physical components, such an EC2 instance, or a logical resource, such an application. A Terraform resource has a type and a name. In a configuration file, a resource is described in a "resource" block.
The primary kind of resource, declared by a resource block, is known as a managed resource. A managed resource is different from a data resource, which provides read-only data exposed as a data source. Both kinds of resources take arguments and export attributes for use in configuration, but while managed resources cause Terraform to create, update, and delete infrastructure objects, data resources cause Terraform only to read objects. For brevity, managed resources are often referred to just as "resources" when the meaning is clear from context.
resource "resource-name" "local-name" { ... }
Resource Type
The resource type and name together serve as an identifier for a given resource and so must be unique within a module.
Resource Name
The resource name is used to refer to this resource from elsewhere in the same Terraform module, but has no significance outside of the scope of a module. The resource type and name together serve as an identifier for a given resource and so must be unique within a module.
Resource Syntax
Resource Dependencies
Resource parameters may use information from other resources. This relationship is expressed syntactically via an interpolation expression.
instance = aws_instance.example.id
If the resources are not dependent, they can be created in parallel, which will be done by Terraform whenever possible.
Implicit Dependency
Implicit dependencies via interpolation expressions are the primary way to inform Terraform about these relationships and should be used whenever possible.
Explicit Dependency
Explicit dependencies are expressed with “depends_on”. This is when the dependency is configured inside the application code, and it has to be explicitly mirrored in the infrastructure configuration.
depends_on = [aws_s3_bucket.example]
Tainted Resource
When provisioning fails, resources are marked as "tainted". Resources can be manually tainted with the “taint” command. This command does not modify infrastructure, but it modifies the state file to mark the resource as tainted – the next plan will show that the resource will be destroyed and recreated.
Data Source
A data source allows data to be fetched or computed for use in Terraform configuration, in a read-only manner, from a data resource. The underlying resource is queried, but not created, updated or destroyed, unlike in the managed resource case. Use of data sources allows a Terraform configuration to make use of information defined outside of Terraform, or defined by another separate Terraform configuration. A data source is accessed via a special kind of resource known as a data resource, declared with a data block.
data "data-source-name" "local-name" { ... }
The data block requests Terraform to read from a given data source ("aws_ami") and export the result under the given local name. The name is used to refer to this resource from elsewhere in the same Terraform module, but has no significance outside of the scope of a module. The data source and name together serve as an identifier for a given resource and so must be unique within a module.
Provisioning
In this context, provisioning means initialization of the resources created by the “apply” step by performing software provisioning. Another name for provisioning is instance initialization.
Provisioner
A provisioner uploads files, runs shell scripts, installs and trigger other software like configuration management tools. A provisioner is only run when the resource is created. The provisioner is declared inside a resource block with the “provisioner” keyword.
resource "aws_instance" "example" {
…
provisioner "local-exec" {
command = "echo ${aws_instance.example.public_ip} > ip_address.txt"
}
}
Multiple provisioner blocks can be added.
Failed Provisioner
If a resource is successfully created but fails during provisioning, it is marked as “tainted”.
Available Provisioners
- local-exec
- remote-exec (via ssh, specified with a “connection” keyword) https://learn.hashicorp.com/terraform/getting-started/provision#defining-a-provisioner
Module
A module is a self-contained package of Terraform configuration that is managed as a group. Modules are used to create reusable components, and treat pieces of infrastructure as a black box. There has been a change in semantics in Terraform 0.12. Modules can be nested to decompose complex systems into manageable components. A module may include automated tests, examples and documentation. A good module should raise the level of abstraction by describing a new concept in your architecture that is constructed from resource types offered by providers. Hashicorp documentation recommends against writing modules that are just thin wrappers around existing resources. If you have trouble finding a name for your module that isn't the same as the main resource type inside it, that may be a sign that your module is not creating any new abstraction and so the module is adding unnecessary complexity. Just use the resource type directly in the calling module instead.
Root Module
When terraform apply is executed, all .tf files in the working directory terraform is executed from form the root module. The root module may call other modules and connect them together by passing output values from one to input values of another.
Using a Module
To call a module from its dependent module means to include the contents of that module into the configuration with specific values for its input variables. The intention to call (or use) a module is declared in a module
block, specified within the dependent module, which contains the source, a set of input values, which are listed in the module's "Inputs" documentation. The only required attribute is source
attribute, which tells Terraform where the dependency module can be retrieved. In is also highly recommended to specify the module's version
. Terraform automatically downloads and manages modules. Terraform can retrieve modules from a variety of sources, including the local filesystem, Terraform Registry, private module registries, Git and HTTP. For more details see Accessing a Remote Module below.
terraform {
required_version = "0.11.11"
}
provider "aws" {
...
}
module "consul" {
source = "hashicorp/consul/aws"
version = "0.7.3"
num_servers = "3"
}
Accessing a Remote Module
Terraform can retrieve modules from a variety of remote sources, including Terraform Registry, private module registries, GitHub, Git and HTTP.
GitHub
source = "github.com/hashicorp/terraform-aws-consul/modules/consul-cluster?ref=v0.7.3"
TODO This did not work, more research is needed. The error message was:
Error: Failed to download module Could not download module "test" (root.tf:2) source code from "github.com/hashicorp/terraform-aws-consul/modules/consul-cluster?ref=v0.7.3": subdir "modules/consul-cluster%253Fref=v0.7.3" not found
Module Examples (from Terraform Registry)
Using a Module
TODO
Module Versioning
Module Syntax
Module Initialization
If a module is referred in configuration, it is necessary to run - or re-run - terraform init, which obtains and installs the new module's source code.
Module Outputs
A module's outputs are values produced by the module: the ID of each resource it creates:
${module.module-name.output-name}
Module Destruction
All resources created by the module will be destroyed.
Writing a Module
Terraform Registry
Terraform Registry includes ready-to-use modules for various common purposes - they can serve as larger building-blocks for the infrastructure.
State
The normal Terraform workflow consists in reading configuration, which represents codified infrastructure, and enacting this specification by instantiating resources. Terraform modifies the state of the platform it acts upon. Normally, there should be no need to represent that state, as it is reflected by the resources it changed. However, from a practical perspective, relying on resources to read their state every time that state is needed is impractical and ineffective, especially when the size of the problem is large.
Relationship between state and configuration.
Local Files
.terraform Directory
.terraform directory is created by default in the root module by terraform init command and it has the follwing structure:
.
└── plugins
└── darwin_amd64
├── lock.json
└── terraform-provider-aws_v2.49.0_x4
.terraform contains the plugins subdirectory.
.terraform.tfstate.lock.info
The file is created is created by default in the root module while an operation that modifies state is underway.
{
"ID":"121f201d-a963-35fd-5d93-9061a4168511",
"Operation":"OperationTypeApply",
"Info":"",
"Who":"ovidiufeodorov@Ovidiu-Feodorov.local",
"Version":"0.12.13",
"Created":"2020-02-20T02:38:55.865036Z",
"Path":"terraform.tfstate"
}
terraform.tfstate
The file is created is created by in the root module by default.
terraform.tfstate.backup
The file is created is created by in the root module by default.
What it is?
Why it is needed?
Where it is maintained: locally or remotely.
Backends
Workspace.
Need for state.
terraform.tfstate State File
A state file is created when the project is first initialized. It is maintained in the root of the project as terraform.tfstate
. State is used to create plans and manage changes to infrastructure. Prior to any operation, the state is refreshed from the real infrastructure – making the state the source of truth. The content of the file can be inspected with terraform show.
What about state.tfstate. Is this a standard file? What role does it play?
Backend
S3 Backend
Remote State
It is recommended to setup remote state.