Configuration Drift

More and more companies are beginning to understand that using a staging environment to test features causes more harm than good. Because this process separates where end-users will interact with new features and where engineering teams will test new features, something problematic is bound to happen. A configuration drift happens as these two (or more) environments grow to be more and more different.

As engineering teams evolve and their product gradually develops, changes will be made to both the configuration and the infrastructure of the application. This change is called a configuration drift.

Increasing the Divide between Staging and Production

Let’s look at a typical example of a configuration drift in practice. An engineer gets paged late one night because of an incident for his mobile application. He looks at the logs, and identifies the problem. In order to fix it, he needs to update a specific configuration in production. He makes the change in production and goes back to sleep. Although he fixed the issue, he has just created an even bigger divide between his staging and production environments because he did not make the same change in his staging environment. Many times, staging environments are not the same as production because of changes made during incident management. Although it was not intentional to make more of a difference between the environments, that is generally what happens when there are several environments in play.

As you are increasing the differences between your real-world and test environments, the trust in your staging environment will slowly decline. You will not be able to reliably test in staging because the test results will likely be different in production. Configuration drift can cause unidentified bugs, as well as cost your team time and money.

Automating the Creation and Maintenance of Environments

One way to avoid configuration drift is to apply infrastructure-as-code principles. The idea here is you want to replace manually trying to keep environments in sync with defining the environment with software and code. Then in the code, you can apply the same configurations to all of your environments. The risk that happens when setting up environments manually is that you don’t set them up the same way. It’s much more consistent to have the computer make the changes automatically. Ideally, you want to avoid repeating yourself in cod (DRY). Looking back at our example above with the engineer who made the configuration change for prod, instead of making the change for the one environment, he should have made it to a script that defines all of the environments, and then it would have automatically been applied to all of the other environments.

Another way to avoid the issues caused by configuration drift is to set up feature flags to test your code in production safely. With the removal of your pre-prod environments, not only will you not have to worry about the status of your staging environments, but you will be able to release faster.