There’s a common mistake engineers make when getting started with feature flags – they don’t test their flags. Feature-flagged code runs in production just like the rest of your code, and it benefits from automated tests in just the same way. Good automated testing is particularly important for code behind a feature flag since you can’t rely as much on manual exploratory testing to catch issues.
However, writing unit tests for feature-flagged code can be a bit of a pain. To simulate a flag being in different states, we might have to muck around with mocks and stubs, and in general unit tests for feature-flagged code can be repetitive, bloated with boilerplate code. But, with a bit of intentional effort, there are ways to cut through this cruft, reducing the duplication by adding some helpers and extensions to our unit-testing framework.
In this post, we’ll look at some typical unit-tests for feature-flagged code and see how we can leverage our unit-testing framework – Jest, in this case – to keep those tests trim and readable.
A Simple Feature-flagged Change
For this post, let’s imagine we’re working on a JavaScript application that includes a user profile (name, email address, and so on). Our web application doesn’t manage this detailed profile information itself; it gets it from an external user profile service. Whenever we need to display things like the current user’s name, we send a request to that service.
However, this approach causes some performance issues – the service takes some time to respond, and we are making calls to it almost every time we display a page. We’ve decided to fix this by adding some caching.
We’ll be using a feature flag to manage this change. This will allow us to:
- Stick to trunk-based development practices
- Test the functionality safely in our production environment
- Use an A/B test to measure the improvement we gain from this change objectively
- Quickly disable the change if we run into any issues post-launch
Our UserProfile Class
To make this change, we’ll be adding some logic to a UserProfile
class. This class is responsible for abstracting over the details of how we access user profiles so that the rest of our codebase doesn’t have to worry about them. This also makes this class the perfect place to add some caching.
Let’s see what this UserProfile
class looks like before our change. In a real codebase, this class would have more logic in it, but for the purposes of this example, we’re keeping it really simple.
module.exports = class UserProfile{
constructor({userService}){
this.userService = userService;
}
async getUser(userId){
const response = await this.userService.fetchUserDetails(userId);
return this.transformResponse(response);
}
transformResponse(response){
// in a real system, we'd do some logic here to
// pull out the information we care about in a nice
// format
return response;
}
}
Code language: JavaScript (javascript)
And here’s the unit tests for that class:
const UserProfile = require('./userProfile');
describe("UserProfile", ()=> {
it('makes a call to the user service with the appropriate userId', async ()=> {
const fakeUserService = {
fetchUserDetails: jest.fn().mockResolvedValue({})
};
const someUserId = 123;
const userProfile = new UserProfile({
userService:fakeUserService,
});
await userProfile.getUser(someUserId);
expect(fakeUserService.fetchUserDetails).toHaveBeenCalledWith(someUserId);
});
});
Code language: JavaScript (javascript)
We create a fake userService
using Jest’s built-in mock functions feature, then use that fake to verify that the UserProfile uses its userService
in the right way. Again, in a real codebase, we’d probably have some additional tests, but what we have here serves our purposes.
Add Caching
Now let’s add our caching logic, controlled by a feature flag:
module.exports = class UserProfile{
constructor({userService,featureFlags}){
this.userService = userService;
this.featureFlags = featureFlags;
this.userCache = new Map();
}
async getUser(userId){
if(this.featureFlags.shouldCacheUserProfile()){
return this.getUserWithCache(userId);
}else{
return this.getUserWithoutCache(userId);
}
}
async getUserWithCache(userId){
if(this.userCache.has(userId)){
return this.userCache.get(userId);
}
const user = await this.getUserWithoutCache(userId);
this.userCache.set(userId,user);
return user;
}
async getUserWithoutCache(userId){
const response = await this.userService.fetchUserDetails(userId);
return this.transformResponse(response);
}
transformResponse(response){
// in a real system, we'd do some logic here to
// pull out the information we care about in a nice
// format
return response;
}
}
Code language: C# (cs)
You can see that we’re now passing in a featureFlags
object when we construct our UserProfile
. That object is a simple custom class that abstracts over whatever underlying feature flagging system we’re using. It exposes each different feature flag check as a separate method call, so rather than calling featureFlagClient.isOn('CACHED_USER_PROFILE')
with a hard-coded magic string, we instead call featureFlags.shouldCacheUserProfile()
, which is a little nicer.
We then consult that featureFlags
instance in our getUser
method to check whether we should use our simple Map-based caching implementation.
The implementation looks good, but it needs some tests. Let’s take a look at those next.
Two Categories of Feature Flag Test in Jest
When testing a feature-flagged change, there are typically two distinct things we want to test. We want to test what’s different when the flag changes – the behavior which varies based on the flag – and we want to test what stays the same – invariant behavior that should not be impacted by the flag changing.
What does that mean for our example? The thing that should vary is the caching behavior. When the flag is on, we should not see duplicate calls to the underlying user service when we ask for information on the same user. The thing that should stay consistent is how we interact with that user service when we call it. Whether we’re caching or not, when we do interact with that service, we should pass the same userId through, for example.
Here are some unit tests which describe what we expect to be different:
describe("UserProfile", ()=> {
let fakeUserService;
beforeEach( ()=>{
fakeUserService = {
fetchUserDetails: jest.fn().mockReturnValue({fake:"response"})
};
});
describe("with caching off", ()=> {
let fakeFeatureFlags;
beforeEach( ()=>{
fakeFeatureFlags = createFakeFeatureFlags({shouldCacheUserProfile:false});
});
it('calls the user service every time', async ()=> {
const userProfile = new UserProfile({
userService:fakeUserService,
featureFlags:fakeFeatureFlags
});
await userProfile.getUser('blah');
await userProfile.getUser('blah');
expect(fakeUserService.fetchUserDetails).toHaveBeenCalledTimes(2);
});
});
describe("with caching on", ()=> {
let fakeFeatureFlags;
beforeEach( ()=>{
fakeFeatureFlags = createFakeFeatureFlags({shouldCacheUserProfile:true});
});
it('calls the user service once, then returns cached result', async ()=> {
const userProfile = new UserProfile({
userService:fakeUserService,
featureFlags:fakeFeatureFlags
});
const firstResult = await userProfile.getUser('blah');
const secondResult = await userProfile.getUser('blah');
expect(fakeUserService.fetchUserDetails).toHaveBeenCalledTimes(1);
expect(secondResult).toEqual(firstResult);
});
});
function createFakeFeatureFlags(overrides={}){
return {
shouldCacheUserProfile: ()=> overrides.shouldCacheUserProfile
};
}
});
Code language: JavaScript (javascript)
In these tests, we have two different describe blocks. One describes the behavior when the feature flag is on, the other when it is off. We achieve that via a beforeEach
hook for each describe block, which sets up a fake featureFlags
instance to simulate the flag being on or off, respectively, via a stubbed-out response to its shouldCacheUserProfile
method.
With the caching flag off, the first test validates that when we ask a UserProfile
instance for the same user twice, it calls through to its backing user service each time, even though we’re using the same user id. With the caching flag on, the second test validates the opposite – when we ask the UserProfile for the same user twice, it only calls its backing service once, instead of pulling from its cache for the second request.
This is an excellent example of testing a feature flag by testing what’s different. With the flag off, we expect duplicate calls to the backing service. With the flag on, we expect a single call. Now let’s look at some tests which validate what stays the same.
describe("UserProfile", ()=> {
let fakeUserService;
beforeEach( ()=>{
fakeUserService = {
fetchUserDetails: jest.fn().mockReturnValue({fake:"response"})
};
});
describe("with caching off", ()=> {
let fakeFeatureFlags;
beforeEach( ()=>{
fakeFeatureFlags = createFakeFeatureFlags({shouldCacheUserProfile:false});
});
it('makes a call to the user service with the appropriate userId', async ()=> {
const someUserId = 123;
const userProfile = new UserProfile({
userService:fakeUserService,
featureFlags:fakeFeatureFlags
});
await userProfile.getUser(someUserId);
expect(fakeUserService.fetchUserDetails).toHaveBeenCalledWith(someUserId);
});
// … other tests elided
});
describe("with caching on", ()=> {
let fakeFeatureFlags;
beforeEach( ()=>{
fakeFeatureFlags = createFakeFeatureFlags({shouldCacheUserProfile:true});
});
it('makes a call to the user service with the appropriate userId', async ()=> {
const someUserId = 123;
const userProfile = new UserProfile({
userService:fakeUserService,
featureFlags:fakeFeatureFlags
});
await userProfile.getUser(someUserId);
expect(fakeUserService.fetchUserDetails).toHaveBeenCalledWith(someUserId);
});
// … other tests elided
});
});
Code language: JavaScript (javascript)
We have the same two describe blocks, with the same beforeEach
hooks simulating the caching flag as off or on. However, we’re looking at two different tests here. These tests both do the same thing – they validate that when we ask a UserProfile
instance to get a user by their ID, that same ID gets passed to the underlying user service.
This is an example of testing what stays the same – whether the caching flag is on or off, we always want to correctly use the user service to retrieve data about the user. You can imagine further tests that validate that we parse and transform the response from that service correctly.
Clean Up the Duplication
When I said that these tests do the same thing, I meant it. They are the exact same test. That’s not great. I’ve often seen this phenomenon in the wild – an engineer in a hurry needs to test that their feature flag is not breaking existing behavior, so they copy-paste the tests and just make whatever setup changes are needed to simulate the flag being in a different state.
This approach is not good for all the reasons that duplicated code is not good. The duplicated tests can drift apart over time, as one test is updated but the other overlooked. What’s worse is the burden these tests place on a reader of the test code. It’s tough to detect what’s the same and what’s different. The intent of this pair of tests – to validate that behavior is unaffected by the state of the flag – is almost entirely lost.
We can fix these issues if we’re willing to roll up our sleeves and do a little bit of plumbing work. With a little understanding of how the Jest DSL works and little functional programming, we can create a tiny little framework extension that will remove this duplication and boost the expressiveness of our tests.
We’ll start by imagining how a more expressive test might look. We want a test that shows that regardless of whether the caching flag is on or off, a certain chunk of behavior remains the same. I would call this behavior that’s invariant to the state of the caching flag. Given that, here’s what a more descriptive test DSL might look like:
describe("UserProfile", ()=> {
describeFeatureFlagInvariant('shouldCacheUserProfile', (fakeFeatureFlags)=>{
it('makes a call to the user service with the appropriate userId', async ()=> {
const fakeUserService = {
fetchUserDetails: jest.fn()
};
const someUserId = 123;
const userProfile = new UserProfile({
userService:fakeUserService,
featureFlags:fakeFeatureFlags
});
await userProfile.getUser(someUserId);
expect(fakeUserService.fetchUserDetails)
.toHaveBeenCalledWith(someUserId);
});
});
});
Code language: PHP (php)
The actual test that’s validating the behavior is the same as before. But rather than duplicating it within two different describe blocks, we’ve instead placed it inside of a new describeFeatureFlagInvariant
block. The intention of this block is to say, “whether the given flag is on or off, this test should pass.” We can achieve that by having that block dynamically create two distinct describe blocks – one where fakeFeatureFlags
says that the flag is true, and another where it says the flag is false.
Here’s how we can actually implement this dynamic behavior:
function describeFeatureFlagInvariant(featureCheckName,fn){
describeFeatureFlagOn(featureCheckName,fn);
describeFeatureFlagOff(featureCheckName,fn);
}
function describeFeatureFlagOn(featureCheckName,fn){
describe(`with feature ${featureCheckName} on`, ()=>{
const featureFlags = createFixedFeatureFlags(featureCheckName,true);
fn(featureFlags);
});
}
function describeFeatureFlagOff(featureCheckName,fn){
describe(`with feature ${featureCheckName} off`, ()=>{
const featureFlags = createFixedFeatureFlags(featureCheckName,false);
fn(featureFlags);
});
}
function createFixedFeatureFlags(featureCheckName,fixedFlagState){
return {
[featureCheckName]: ()=> fixedFlagState
};
}
Code language: JavaScript (javascript)
This turns out to not be that complicated. First let’s look at describeFeatureFlagOn
and describeFeatureFlagOff
, two very similar helper functions. In each function, we dynamically create a new describe block, simply by calling describe
as you would in a regular test file. Within that describe block, we then call createFixedFeatureFlags
to create a fake featureFlags instance with the appropriate flag checking method hardcoded to either true or false. Finally, we invoke the inner function provided to us, passing the fake featureFlags
instance along.
That last part might be a bit of a mind-bender if you’ve not had much experience in the past with functional programming concepts like higher-order functions. We’re doing the same trick that Jest does to define the it
and describe
functions. We’re creating a function that takes another function as a parameter, doing some setup, and then calling that function ourselves. This might seem weird, but it’s a great technique to use when building these sort of expressive, declarative framework helpers.
The last piece of code to look at is the describeFeatureFlagInvariant
helper. This is quite simple; it invokes both describeFeatureFlagOn
and describeFeatureFlagOff
, passing in the function it received. This has the effect of creating two describe blocks, one with the flag on and one with it off, but both containing the same test. And that’s exactly what we want to achieve – run the same test twice, with the flag on and with it off, in order to validate that the tested behavior is the same regardless.
Get Even More Expressive with Jest
We started creating these test extensions to help with testing a feature flag in terms of what stays the same, but it turns out that we can also use them to make the tests for what is different more expressive.
Using those helpers, we can now refactor those first tests we looked at as follows:
const UserProfile = require('./cachingUserProfile');
describe("UserProfile", ()=> {
describeFeatureFlagOff('shouldCacheUserProfile', (fakeFeatureFlags)=>{
it('calls the user service every time', async ()=> {
const fakeUserService = {
fetchUserDetails: jest.fn().mockReturnValue({fake:"response"})
};
const userProfile = new UserProfile({
userService:fakeUserService,
featureFlags:fakeFeatureFlags
});
await userProfile.getUser('blah');
await userProfile.getUser('blah');
expect(fakeUserService.fetchUserDetails)
.toHaveBeenCalledTimes(2);
});
});
describeFeatureFlagOn('shouldCacheUserProfile', (fakeFeatureFlags)=>{
it('calls the user service once, then returns cached result', async ()=> {
const fakeUserService = {
fetchUserDetails: jest.fn().mockReturnValue({fake:"response"})
};
const userProfile = new UserProfile({
userService:fakeUserService,
featureFlags:fakeFeatureFlags
});
const firstResult = await userProfile.getUser('blah');
const secondResult = await userProfile.getUser('blah');
expect(fakeUserService.fetchUserDetails)
.toHaveBeenCalledTimes(1);
expect(secondResult).toEqual(firstResult);
});
});
});
Code language: JavaScript (javascript)
The implementation of these two tests hasn’t changed, but the way we set up the enclosing describe blocks has improved. Rather than mucking around with beforeEach
blocks and hand-written describe blocks, we can use our declarative describeFeatureFlagOn
and describeFeatureFlagOff
helpers.
This reduces the amount of boiler-plate in our test code – always a good thing! More importantly, it makes our tests more expressive to the reader. I can clearly see that these tests are validating behavior which is specific to when the user profile caching flag is on and when it’s off.
Learn More About Testing, Feature Flags, and JavaScript
Feature-flagged code benefits from automated testing just as much as any other code. High-quality unit-testing is even more valuable when you have codepaths that aren’t always exposed to other types of test.
In this post, we’ve seen the two different types of testing you can do for feature-flagged code – testing what’s the same and testing what’s different. We’ve also discovered how adding some feature-flag-specific extensions to your unit-testing framework can reduce boilerplate and increase expressiveness.
While we focused on JavaScript and Jest in this post, the same idea can be applied in any good-quality unit-testing framework. For example, this post describes how the Split SDK provides some similar extensions to Java’s JUnit testing framework.
Whatever your tech stack, make sure you test those flags! And if you’re looking for more great content like this, we’ve got you covered:
- Feature Flag Flow: The Key to Sane Feature Flag Management
- 7 Ways We Use Feature Flags Every Day at Split
- How to Implement Testing in Production
- Easily Set Up Smoke Tests in Your Build Pipeline
- Get Started with Feature Flags in Node
And as always, we’d love to have you follow along as we share new stories. Check us out on YouTube, Twitter, and LinkedIn!
Get Split Certified
Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.
Deliver Features That Matter, Faster. And Exhale.
Split is a feature management platform that attributes insightful data to everything you release. Whether your team is looking to test in production, perform gradual rollouts, or experiment with new features–Split ensures your efforts are safe, visible, and highly impactful. What a Release. Get going with a free account, schedule a demo to learn more, or contact us for further questions and support.