Blue-Green Deployments

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Overview

The idea behind the blue-green deployment technique is that at any time there are two similar instances of the application in operation. One instance takes production load, while the other is being upgraded to the next version of the application, tested, and prepared for production. Then, the traffic is switched over from the current instance to the new one. This sequence repeats indefinitely. This technique reduces the time it takes to put "done" software in production, and also fastens recovery, in case of problems, as it will be explained below.


Application instance vs. environment. An instance of an application my comprise a large multi-node cluster. We avoid naming it "environment", because we reserve the word "environment" for a different logical abstraction, a multi-application structure that provides complex business functionality to its users. While we can certainly design a workflow that implements a blue-green switch-over at environment level, we want to make sure that the components of an environment - the applications - can undergo a blue-green switch-over in isolation from other components of the environment.

One of the most sensitive moments of putting a new application version in production is the cut-over moment.

At it simplest, it requires scheduling a maintenance outage: the traffic is cut, the old version is shut down, the new version is spun up and traffic is resumed. This strategy ends up in downtime, which is something we want to avoid. Moreover, if the new version breaks, the application will not work. This requires shutting traffic down again, while the old version is restored and put back into traffic, which incurs significantly more downtime.

Blue-green deployments are a solution to this situation.

Blue-green technique requires two application instances running at the same time. They are conventionally named the "blue instance" and the "green instance". The instances must be as similar as possible. At a certain moment, the blue instance is live and is serving production traffic. The green instance, which was idling alongside it, is upgraded to a new version of the application. The green instance is then tested, and once the software is proved to be working correctly, the production traffic is switched over to it. The in-flight production traffic is drained from the blue instance until the blue instance becomes idle. At this point, the entire production traffic is served by the new application version running in the green instance. The blue instance is left idling for a while, until we have the proof that the green instance works as expected. Then, the blue instance can be used to prepare the next version, by repeating the cycle. This way both instances are regularly cycling between live, previous version for rollback and next version.

If the green instance does not work as expected, the production traffic is routed back to the idling blue instance, and the green instance is taken down for troubleshooting.

The application design should allow dealing with the missing transactions while the green environment was live. A few techniques for this situation are:

  • Feed traffic to both application instances.
  • Switch both instances in read only mode and only after the blue instance is proven valid, resume read write on blue only, and remove access from green.

The application instance being prepared for production could be deployed manually, but that is not advisable. A more efficient approach is to perform the application deployment and testing (and optionally the infrastructure provisioning) as part of a completely automated continuous delivery pipeline.

Handling State

The above scenario assumes that the blue and green instance share a database. If no schema changes are introduced by the new version, there are no additional complications. However, if the new version requires a new schema version, the deployment of schema changes must be separated from the application upgrade.

The first stage is toe apply a database refactoring to change the schema to support both the new and the old version. After the schema change is deployed and verified to work fine, deploy the new version of the application using the blue-green technique.

State Continuity in Operations

Implementations