Infrastructure Code Testing Concepts: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(37 intermediate revisions by the same user not shown)
Line 5: Line 5:


=Overview=
=Overview=
A practice recommended when developing infrastructure as code is to [[Infrastructure_as_Code_Concepts#Continuously_Test_and_Deliver|continuously test and deliver]], same as for application software development.
[[Infrastructure_as_Code_Concepts#Continuously_Test_and_Deliver|Continuously testing and delivery]] should be used for infrastructure code, same as for application software development.


Continuously testing [[Infrastructure_as_Code_Concepts#Small_Pieces|small pieces]] encourages a modular, loosely coupled design and it helps finding problems sooner, then quickly iterating, fixing and rebuilding the problematic code. This process yields better infrastructure. The fact that test suite such developed remains with the code base and is continuously exercises as part of CD runs is referred to as "building quality in" rather than "testing quality in". Finding and fixing problems continuously avoids the accumulation of technical debt.
Continuously testing [[Infrastructure_as_Code_Concepts#Small_Pieces|small pieces]] encourages a modular, loosely coupled design and it helps finding problems sooner, then quickly iterating, fixing and rebuilding the problematic code. This process yields better infrastructure. The fact that test suite developed as part of this process remains with the code base and is continuously exercised as part of CD runs is referred to as "building quality in" rather than "testing quality in". Finding and fixing problems continuously avoids the accumulation of technical debt.


Even if that the word "infrastructure" may suggest that it is built once, and then forgotten, this is far from the truth. Infrastructure needs contestant change: patching, upgrading, fixing and improving. Every time the infrastructure is modified, automated tests decrease the likeliness that something will break. This is why building the delivery and testing systems within the primary system is a good idea. If that is done properly, "going live" is almost an arbitrary event, a change in who is using the system, but not how the system is managed.
Even if that the word "infrastructure" may suggest that it is built once, and then forgotten, this is far from the truth. Infrastructure undergoes constant change: patching, upgrading, fixing and improving. Every time the infrastructure is modified, automated tests decrease the likelihood that something will break. This is why building the delivery and testing systems within the primary system is a good idea. If that is done properly, "going live" is almost an arbitrary event, a change in who is using the system, but not how the system is managed.


=The Infrastructure Test Diamond=
=The Infrastructure Test Diamond=
The [[Software_Testing_Concepts#Test_Pyramid|test pyramid]] is a good model for application software testing, but does not apply very well to infrastructure code testing. Low level offline unit tests are [[#Declarative_Code_Tests_Often_Have_Low_Value|not very valuable for declarative code]] so we don't need so many of them. The testing model for infrastructure code looks like a diamond:
The [[Software_Testing_Concepts#Test_Pyramid|test pyramid]] is a good model for application software testing, but does not apply very well to infrastructure code testing. Low level offline unit tests are [[#Declarative_Code_Tests_Often_Have_Low_Value|not very valuable for declarative code]] so we don't need so many of them. The testing model for infrastructure code looks more like a diamond:
:::[[File:Test_Diamond_For_Infrastructure_Tests.png|343px]]
:::[[File:Test_Diamond_For_Infrastructure_Tests.png|343px]]
The pyramid may make sense for imperative infrastructure code.
A pyramid model may make sense for imperative infrastructure code, though.


=Infrastructure Test Categories=
=Infrastructure Test Categories=
The [[Infrastructure_as_Code_Concepts#Stack|stack]] mentioned below is a collection of infrastructure resources that are defined, changed and managed together as a unit.
A [[Infrastructure_as_Code_Concepts#Stack|stack]] is a collection of infrastructure resources that are defined, changed and managed together as a unit.
==<span id='Offline'></span>Offline Stack Tests==
==<span id='Offline'></span>Offline Stack Tests==
An offline test runs on the pipeline agent node and does not require external infrastructure or external service (as a database). The offline tests run quickly and validate the correctness of components in isolation. Loosely coupled systems encourage offline testing - their components are cleanly decoupled and dependencies can be replaced with test doubles that work offline. An offline test proves that the component is cleanly decoupled.  
An offline test runs on the pipeline agent node, does not make outgoing calls and does not require external infrastructure or external service (as a database). The offline tests run quickly and validate the correctness of components in isolation. Loosely coupled systems encourage offline testing - their components are cleanly decoupled and dependencies can be replaced with test doubles that work offline. The mere existence of an offline test proves that the component is cleanly decoupled. Offline tests are usually run as part of the [[Infrastructure_Code_Continuous_Delivery_Concepts#Build|build activity]] of the continuous delivery pipeline.
===Syntax Checking===
===Syntax Checking===
Most infrastructure tools provide a "dry run" command that parses the code without applying it to infrastructure. The command exists with an error if there is a syntax error and can be used in offline tests, as it does not need access to the dependencies or the infrastructure platform.
Most infrastructure tools provide a "dry run" command that parses the code without applying it to infrastructure. The command exists with an error if there is a syntax error and can be used in offline tests, as it does not need access to the dependencies or the infrastructure platform.
===Static Code Analysis===
===Static Code Analysis===
Some infrastructure tools can parse and analyze source code for a wider class of issues than just syntax (coding errors, confusing or poor coding style, adherence to a code style policy), but still without connecting to the infrastructure platform. This analysis is often called linting. Some tools can even modify code to comply with a certain style. Static code analysis tools:
Some infrastructure tools can parse and analyze source code for a wider class of issues than just syntax (coding errors, confusing or poor coding style, adherence to a code style policy), but still without connecting to the infrastructure platform. This analysis is often called "linting". Some tools can even modify code to comply with a certain style. Static code analysis tools:
* [[tflint#Overview|Terraform tflint]]
* [[tflint#Overview|Terraform tflint]]
* [[CloudFormation Linter#Overview|CloudFormation Linter]]
* [[CloudFormation Linter#Overview|CloudFormation Linter]]
* [[checkov#Overview|checkov]]
* [[checkov#Overview|checkov]]
Depending on the tool, some static code analysis checks may connect to the cloud platform API to check for conflicts with what the platform supports.
Depending on the tool, some static code analysis checks may connect to the cloud platform API to check for conflicts with what the platform supports.
===Static Security Analysis===
===Static Security Analysis===
A specialized from of static analysis is static security analysis:
* [[cfn_nag#Overview|cfn_nag]]
* [[cfn_nag#Overview|cfn_nag]]
* [[tfsec#Overview|tfsec]]
* [[tfsec#Overview|tfsec]]
===Testing with Mock APIs and Doubles===
===Testing with Mock APIs and Doubles===
An online test can be turned into an offline test if it accept integration with a [[#TD|test double]] or it can use a mock API, such as those provided by [[#Cloud_Mocking_Tools|cloud mocking tools]]. This type of testing is less valuable for declarative code for reasons described in the "[[#Declarative_Code_Tests_Often_Have_Low_Value|Declarative Code Tests Often Have Low Value]]" section, but they can be useful for unit testing imperative code.
An online test can be turned into an offline test if it accept integration with a [[#TD|test double]] or it can use a mock API, such as those provided by [[#Cloud_Mocking_Tools|cloud mocking tools]]. This type of testing is less valuable for declarative code for reasons described in the "[[#Declarative_Code_Tests_Often_Have_Low_Value|Declarative Code Tests Often Have Low Value]]" section, but they can be useful for unit testing imperative code. Tests executed against test doubles are usually run as part of the [[Infrastructure_Code_Continuous_Delivery_Concepts#Build|build activity]] of the continuous delivery pipeline.


==<span id='Online'></span>Online Stack Tests==
==<span id='Online'></span>Online Stack Tests==
An online test involves using the infrastructure platform to create and interact with an instance of the stack. The test still focuses on a single stack instance, unlike a [[#System|system test]]. This type of test is slower than an [[#Offline|offline test]] but the feedback it provides is more meaningful. Even if the presence of the infrastructure platform is required, the stack can be designed in such a way that it does NOT require the presence of other stack instances for testing. Upon its execution, the test makes assertions after the infrastructure in the stack. Frameworks for testing infrastructure resources include:
An online test involves using the infrastructure platform to create and interact with an instance of the stack. The test still focuses on a single stack instance, unlike a [[#System|system test]]. This type of test is slower than an [[#Offline|offline test]] but the feedback it provides is more meaningful. Even if the presence of the infrastructure platform is required, the stack can be designed in such a way that it does NOT require the presence of other stack instances for testing. Upon its execution, the test makes assertions after the infrastructure in the stack instance. Frameworks for testing infrastructure resources include:
* [[Awspec#Overview|Awspec]]
* [[Awspec#Overview|Awspec]]
* [[Clarity#Overview|Clarity]]
* [[Clarity#Overview|Clarity]]
Line 41: Line 44:
* [[Taskcat#Overview|Taskcat]]
* [[Taskcat#Overview|Taskcat]]
* [[Terratest#Overview|Terratest]]
* [[Terratest#Overview|Terratest]]
While some assertions that verify the infrastructure resources have been created are useful, many low level assertions that verify every configuration element have low values, for reasons described in the "[[#Declarative_Code_Tests_Often_Have_Low_Value|Declarative Code Tests Often Have Low Value]]" section. Assertions are much more useful when the stack code is dynamic and there is embedded logic that might malfunction, as it is the case for [[Infrastructure_as_Code_Concepts#Imperative_Languages_for_Infrastructure|imperative languages for infrastructure]].
The most valuable testing is proving that the infrastructure resources do what they should. The test should prove that the infrastructure works correctly. In case the stack has dependencies, they need to be plugged in, even if [[#Test_Fixture|test fixtures]] are used for that. Using test fixtures makes it much easier to manage tests, keep the stacks loosely coupled and have fast feedback loops.
When you need to test a stack that has a [[Infrastructure_as_Code_Concepts#Stack_Dependencies|dependency]] on another stack, that dependency can be simulated with a test double. Typically, the test setup creates a [[#Test_Fixture|test fixture]] that provides the interface the stack is depending on. Designing the stacks to be testable this way makes them more reusable and composable.


==<span id='System'></span>System Tests==
==<span id='System'></span>System Tests==
These tests assume that the entire system, composed of multiple stack instances, is deployed.
These tests assume that the entire system, composed of multiple stack instances, is deployed. They are some times referred to as integration tests, or system integration tests.
 
===Sample Application===
A good system test would be to deploy a sample (synthetic) application on the newly provisioned or updated system, and prove that is working. The advantage of using a sample application rather than a real application is that it can be kept simple and can be stripped down to a minimal set of dependencies and configurations, so when it causes test failures, it is highly probable that they are caused by the issues with the system provisioning, rather than the complexities of the application. Also see: {{Internal|Software_Testing_Concepts#Smoke_Testing|Software Testing Concepts &#124; Smoke Testing}}
 
=Test Code Location=
The recommended choice is to [[Infrastructure_as_Code_Concepts#Tests|collocate tests with the stack they belong to]]. Progressive testing will ensure each stack is tested using its own tests before the stack is declared as ready for use. The next stage of progressive testing involves integration testing with multiple stacks together. In this case the integration tests can be maintained in the project of the stack that is the obvious entry point for testing (the infrastructure for the front-end service, for example). Most likely, the integration tests are coupled with that service anyway. Integration test also fit well within the projects of components that consume other components. Dedicated integration test projects may also be considered, one per each integration stage. This approach is common when a different team owns the integration tests. One notable challenge when maintaining separate integration test projects is to ensure the correct versions are tested together, and the correct version of tests is used.
 
=Test Fixture=
A test fixture is an infrastructure resource created specifically to help provision and test a stack instance by itself, without needing to instantiate other stacks. [[Software_Testing_Concepts#Test_Double|Test doubles]] are a type of test fixture. A test fixture is not part of the stack that is being tested. It is additional infrastructure created to support the test, and it represent the stack's dependencies.


=Challenges with Testing Infrastructure Code=
=Challenges with Testing Infrastructure Code=
Line 49: Line 66:
Many infrastructure tools use [[Infrastructure_as_Code_Concepts#Declarative_Infrastructure_Languages|declarative languages]], which express desired state. Testing that all details of the desired state have been changed correctly can become soon very tedious, and in fact represents testing of the infrastructure tool. One valid testing scenario is to ensure that the change has been in fact applied, but for that, the test of a single detail of the end state should be sufficient. In the context of "[[Software_Testing_Concepts#.22Given.2C_When.2C_Then.22_Testing|Given, When, Then]]" tests, "When" can be missing for declarative code tests, which suggests that the code does not create variable outcomes. Many tools and practices for testing dynamic code are not appropriate for declarative code.
Many infrastructure tools use [[Infrastructure_as_Code_Concepts#Declarative_Infrastructure_Languages|declarative languages]], which express desired state. Testing that all details of the desired state have been changed correctly can become soon very tedious, and in fact represents testing of the infrastructure tool. One valid testing scenario is to ensure that the change has been in fact applied, but for that, the test of a single detail of the end state should be sufficient. In the context of "[[Software_Testing_Concepts#.22Given.2C_When.2C_Then.22_Testing|Given, When, Then]]" tests, "When" can be missing for declarative code tests, which suggests that the code does not create variable outcomes. Many tools and practices for testing dynamic code are not appropriate for declarative code.


When variables or conditionals are used with declarative code, it makes sense to test the code with more complex tests (there is a "When" now). However, if the declarative code is complex enough that it needs complex testing, that is a sign that the logic should be pulled out of the declarative section and consolidated into a library written into a procedural language, and tested independently.  
When variables or conditionals are used with declarative code, it makes sense to test the code with more complex tests. There is a "When" now. However, if the declarative code is complex enough that it needs complex testing, that is a sign that the logic should be pulled out of the declarative section and consolidated into a library written into a procedural language, and tested independently.  


Another useful tests for declarative code is to ensure the complex infrastructure created or modified by a complex piece of declarative code works as intended, as opposite to checking its state and ensuring the desired state has been transferred correctly to the infrastructure resources.
Another useful tests for declarative code is to ensure the complex infrastructure created or modified by a complex piece of declarative code works as intended, as opposite to checking its state and ensuring the desired state has been transferred correctly to the infrastructure resources.
Line 68: Line 85:
==Dependencies Complicate Testing Infrastructure==
==Dependencies Complicate Testing Infrastructure==
Infrastructure code is particular in that it needs to infrastructure platform and its APIs to work, so the infrastructure platform (or subsets of its APIs) are a required dependency. This may be worked around by using [[Software_Testing_Concepts#Test_Double|test doubles]]. Also, there is a growing number of tools that allow [[#Cloud_Mocking_Tools|mocking the API of the cloud vendors]]. However, it's more useful to use test doubles for other infrastructure components that for the infrastructure platform itself.
Infrastructure code is particular in that it needs to infrastructure platform and its APIs to work, so the infrastructure platform (or subsets of its APIs) are a required dependency. This may be worked around by using [[Software_Testing_Concepts#Test_Double|test doubles]]. Also, there is a growing number of tools that allow [[#Cloud_Mocking_Tools|mocking the API of the cloud vendors]]. However, it's more useful to use test doubles for other infrastructure components that for the infrastructure platform itself.
==Shared Development Environments Tend to Break==
If multiple developers use a shared development environment to test infrastructure changes, that environment tends to get in a bad state, because of uncontrolled, concurrent changes. Instead, each developer working with infrastructure code should create their own instance of the infrastructure, and destroy them when they are not actively used anymore.
=Local Testing=
People working on infrastructure code, especially when using TDD techniques, should be able to run the tests locally before pushing code into the shared pipeline and environments. <font color=darkkhaki>This may require setting up "personal infrastructure instances". </font> Use the same test orchestration scripts across local work and pipeline stages. Doing this ensures that tests are set up and run consistently everywhere. Do not couple test orchestration to the pipeline, because this makes it difficult to set up and run tests consistently outside the pipeline. Instead, implement your test orchestration in a separate script or tool. The test stage should call. this tool, passing minimum of configuration parameters


=Cloud Mocking Tools=
=Cloud Mocking Tools=

Latest revision as of 22:08, 7 February 2022

Internal

Overview

Continuously testing and delivery should be used for infrastructure code, same as for application software development.

Continuously testing small pieces encourages a modular, loosely coupled design and it helps finding problems sooner, then quickly iterating, fixing and rebuilding the problematic code. This process yields better infrastructure. The fact that test suite developed as part of this process remains with the code base and is continuously exercised as part of CD runs is referred to as "building quality in" rather than "testing quality in". Finding and fixing problems continuously avoids the accumulation of technical debt.

Even if that the word "infrastructure" may suggest that it is built once, and then forgotten, this is far from the truth. Infrastructure undergoes constant change: patching, upgrading, fixing and improving. Every time the infrastructure is modified, automated tests decrease the likelihood that something will break. This is why building the delivery and testing systems within the primary system is a good idea. If that is done properly, "going live" is almost an arbitrary event, a change in who is using the system, but not how the system is managed.

The Infrastructure Test Diamond

The test pyramid is a good model for application software testing, but does not apply very well to infrastructure code testing. Low level offline unit tests are not very valuable for declarative code so we don't need so many of them. The testing model for infrastructure code looks more like a diamond:

Test Diamond For Infrastructure Tests.png

A pyramid model may make sense for imperative infrastructure code, though.

Infrastructure Test Categories

A stack is a collection of infrastructure resources that are defined, changed and managed together as a unit.

Offline Stack Tests

An offline test runs on the pipeline agent node, does not make outgoing calls and does not require external infrastructure or external service (as a database). The offline tests run quickly and validate the correctness of components in isolation. Loosely coupled systems encourage offline testing - their components are cleanly decoupled and dependencies can be replaced with test doubles that work offline. The mere existence of an offline test proves that the component is cleanly decoupled. Offline tests are usually run as part of the build activity of the continuous delivery pipeline.

Syntax Checking

Most infrastructure tools provide a "dry run" command that parses the code without applying it to infrastructure. The command exists with an error if there is a syntax error and can be used in offline tests, as it does not need access to the dependencies or the infrastructure platform.

Static Code Analysis

Some infrastructure tools can parse and analyze source code for a wider class of issues than just syntax (coding errors, confusing or poor coding style, adherence to a code style policy), but still without connecting to the infrastructure platform. This analysis is often called "linting". Some tools can even modify code to comply with a certain style. Static code analysis tools:

Depending on the tool, some static code analysis checks may connect to the cloud platform API to check for conflicts with what the platform supports.

Static Security Analysis

A specialized from of static analysis is static security analysis:

Testing with Mock APIs and Doubles

An online test can be turned into an offline test if it accept integration with a test double or it can use a mock API, such as those provided by cloud mocking tools. This type of testing is less valuable for declarative code for reasons described in the "Declarative Code Tests Often Have Low Value" section, but they can be useful for unit testing imperative code. Tests executed against test doubles are usually run as part of the build activity of the continuous delivery pipeline.

Online Stack Tests

An online test involves using the infrastructure platform to create and interact with an instance of the stack. The test still focuses on a single stack instance, unlike a system test. This type of test is slower than an offline test but the feedback it provides is more meaningful. Even if the presence of the infrastructure platform is required, the stack can be designed in such a way that it does NOT require the presence of other stack instances for testing. Upon its execution, the test makes assertions after the infrastructure in the stack instance. Frameworks for testing infrastructure resources include:

While some assertions that verify the infrastructure resources have been created are useful, many low level assertions that verify every configuration element have low values, for reasons described in the "Declarative Code Tests Often Have Low Value" section. Assertions are much more useful when the stack code is dynamic and there is embedded logic that might malfunction, as it is the case for imperative languages for infrastructure.

The most valuable testing is proving that the infrastructure resources do what they should. The test should prove that the infrastructure works correctly. In case the stack has dependencies, they need to be plugged in, even if test fixtures are used for that. Using test fixtures makes it much easier to manage tests, keep the stacks loosely coupled and have fast feedback loops.

When you need to test a stack that has a dependency on another stack, that dependency can be simulated with a test double. Typically, the test setup creates a test fixture that provides the interface the stack is depending on. Designing the stacks to be testable this way makes them more reusable and composable.

System Tests

These tests assume that the entire system, composed of multiple stack instances, is deployed. They are some times referred to as integration tests, or system integration tests.

Sample Application

A good system test would be to deploy a sample (synthetic) application on the newly provisioned or updated system, and prove that is working. The advantage of using a sample application rather than a real application is that it can be kept simple and can be stripped down to a minimal set of dependencies and configurations, so when it causes test failures, it is highly probable that they are caused by the issues with the system provisioning, rather than the complexities of the application. Also see:

Software Testing Concepts | Smoke Testing

Test Code Location

The recommended choice is to collocate tests with the stack they belong to. Progressive testing will ensure each stack is tested using its own tests before the stack is declared as ready for use. The next stage of progressive testing involves integration testing with multiple stacks together. In this case the integration tests can be maintained in the project of the stack that is the obvious entry point for testing (the infrastructure for the front-end service, for example). Most likely, the integration tests are coupled with that service anyway. Integration test also fit well within the projects of components that consume other components. Dedicated integration test projects may also be considered, one per each integration stage. This approach is common when a different team owns the integration tests. One notable challenge when maintaining separate integration test projects is to ensure the correct versions are tested together, and the correct version of tests is used.

Test Fixture

A test fixture is an infrastructure resource created specifically to help provision and test a stack instance by itself, without needing to instantiate other stacks. Test doubles are a type of test fixture. A test fixture is not part of the stack that is being tested. It is additional infrastructure created to support the test, and it represent the stack's dependencies.

Challenges with Testing Infrastructure Code

Declarative Code Tests Often Have Low Value

Many infrastructure tools use declarative languages, which express desired state. Testing that all details of the desired state have been changed correctly can become soon very tedious, and in fact represents testing of the infrastructure tool. One valid testing scenario is to ensure that the change has been in fact applied, but for that, the test of a single detail of the end state should be sufficient. In the context of "Given, When, Then" tests, "When" can be missing for declarative code tests, which suggests that the code does not create variable outcomes. Many tools and practices for testing dynamic code are not appropriate for declarative code.

When variables or conditionals are used with declarative code, it makes sense to test the code with more complex tests. There is a "When" now. However, if the declarative code is complex enough that it needs complex testing, that is a sign that the logic should be pulled out of the declarative section and consolidated into a library written into a procedural language, and tested independently.

Another useful tests for declarative code is to ensure the complex infrastructure created or modified by a complex piece of declarative code works as intended, as opposite to checking its state and ensuring the desired state has been transferred correctly to the infrastructure resources.

In any case, the declarative code tests are not useful without the actual infrastructure, they are not meaningful when run with test doubles.

Infrastructure Tests are Slow

Speeding up the test execution involves a combination of strategies. Some of these strategies are not particular to infrastructure testing, they apply to software testing in general:

The following strategies are specific to infrastructure testing:

Decide between Ephemeral or Persistent Instances

An infrastructure resource instance may be created and then destroyed every time it is used (an ephemeral instance) or it can be left running between tests and reused (persistent instance). Persistent instances can make the test significantly faster, but it could make the tests inconsistent, as its state can be changed in unpredictable ways by the previous tests. On the other hand, ephemeral instances may slow down the tests, as they need to be created every time, but are cleaner and give more consistent results. The right choice depends on the particular risks involved on a case-to-case basis.

Decide between Online and Offline Tests

Some types of tests must run online, requiring infrastructure on the real cloud platform. Others can run offline, on the pipeline agent, without need for connection to the cloud platform. Offline tests are usually much faster, but they have relative limited use when it comes to infrastructure software.

Dependencies Complicate Testing Infrastructure

Infrastructure code is particular in that it needs to infrastructure platform and its APIs to work, so the infrastructure platform (or subsets of its APIs) are a required dependency. This may be worked around by using test doubles. Also, there is a growing number of tools that allow mocking the API of the cloud vendors. However, it's more useful to use test doubles for other infrastructure components that for the infrastructure platform itself.

Shared Development Environments Tend to Break

If multiple developers use a shared development environment to test infrastructure changes, that environment tends to get in a bad state, because of uncontrolled, concurrent changes. Instead, each developer working with infrastructure code should create their own instance of the infrastructure, and destroy them when they are not actively used anymore.

Local Testing

People working on infrastructure code, especially when using TDD techniques, should be able to run the tests locally before pushing code into the shared pipeline and environments. This may require setting up "personal infrastructure instances". Use the same test orchestration scripts across local work and pipeline stages. Doing this ensures that tests are set up and run consistently everywhere. Do not couple test orchestration to the pipeline, because this makes it difficult to set up and run tests consistently outside the pipeline. Instead, implement your test orchestration in a separate script or tool. The test stage should call. this tool, passing minimum of configuration parameters

Cloud Mocking Tools