Testing a Feature-Flagged Change

Hard truth: Engineers are deploying half-finished features into production, and they’re doing it on purpose. These engineers are able to do this without getting fired because they’re using feature flags to hide their partially completed work. You can too.

It might sound crazy to be rolling out features to production before they’re finished, but it’s actually in keeping with a key tenet of continuous delivery – the ability to separate code deploy from feature release. By placing a partially implemented feature behind a flag, engineers can break that feature’s development into small chunks. This allows engineers to frequently integrate their work into mainline, avoiding long-lived branches and painful merge conflicts.

However, even if your work is safely protected behind a flag, it still needs to be tested before it’s released. In this post, I’ll explore some of the options you have to test a feature-flagged change.

Free shipping!

We’ll use an example feature and explore our options for testing that feature. Imagine we’re on a product delivery team that works for an e-commerce company, and we’re implementing a free shipping feature. Specifically, we want to offer free shipping for any order over $100. Since this is a reasonably complex change, involving multiple teams, we’ve been developing it behind a free-shipping feature flag. 

At this point, we’re far enough along with development that we’d like to start some exploratory testing of the free shipping functionality. We have a few different ways we could approach that. We could take a traditional approach and do that testing in a pre-production environment. But, since we are using feature flags, we could also go wild and do our testing live, in production! Let’s take a look at both of those approaches.

Testing before production

It’s common to perform exploratory testing of a new feature in some sort of pre-production environment. These environments go by various names. You might know it as testing, qa,uat, staging, or perhaps pre-prod.

Testing a feature-flagged change in one of these environments is as simple as turning on the flag for that feature in that environment, and then starting to test. 

However, if it’s still early days for your feature, then there’s a risk of destabilizing the environment by enabling it before it’s ready. One way to avoid this risk is to only turn the feature on for specific users. For example, we might just turn it on for the person testing it if our feature-flagging system supports this. Now, even if the feature has bugs, it won’t disrupt other users of the environment. 

Testing features in a pre-production environment presents a perennial challenge – we’re never able to fully reproduce the production environment that our actual users live in. We don’t have the same software versions as we do in production, and we don’t have the same data as production. Feature flags present us with a solution to this problem: we can do our testing in production!

Testing in production

We’ve seen that we can use feature-flags to test a feature in a pre-production environment without impacting other users. When you stop and think about it, we can use this same approach to test that feature in production. There are a couple of ways we can do that.

Targeted Release

We can test a feature in production by turning that feature on just for the people testing it if our feature-flagging system provides that sort of “targeted release” capability. You can think of this as like a Canary Release, but one targeted specifically at users who will be actively testing the feature.

This type of targeted release is a common way of testing a feature in production without releasing it to your general user-base. However, there are some situations where this technique isn’t suitable. If you’re testing an area of the system that doesn’t have a concept of a logged-in user – a marketing page perhaps, or some functionality within a signup flow – then your system may not be able to identify who the current user is, and thus you won’t be able to turn the feature on for specific users. In those scenarios, we can turn to other techniques.

Flag Overrides

Another way a tester can turn a feature on is via a flag override – a mechanism that explicitly tells your feature flag system to force a flag to be on for the current user. You can signal this using a unique browser cookie, or query parameter, or HTTP header. A tester would typically use a hidden dev screen to control these flag overrides, turning an override on in production for a short period while testing a feature.

The flag override approach’s main benefit is that your feature-flagging system doesn’t need any knowledge of who the user is – you just directly override the flag that you want to test. 

Which should I use?

We’ve looked at a few different techniques for testing a feature-flagged change. Which should you use? 

Targeted Release is a good default. It’s simple and straightforward to set up – if your feature-flagging system supports it – and it allows you to test features in a very realistic environment: production! When you can’t employ a targeted release – if you’re testing a system that doesn’t have a concept of current user, for example – then flag overrides can be used instead. However, be aware that this will probably require a custom implementation. 

Learn More About Testing in Production

While testing in a pre-production environment is a well-established approach, it’s often unnecessary when you have a feature- flagged change. Testing in production may feel risky, but it can be done in a safe, controlled way, as long as you’re aware of some of the techniques we’ve discussed here.

To learn more about testing in production, check out these great resources:

As always, we’d love to have you follow along for more content on feature flags, experimentation, and testing in production. Check us out on Twitter or LinkedIn, and subscribe to our YouTube channel!