Testing in production is the process of testing your features in the environment that your features will live in. It does not mean releasing untested code to users and hoping it works, but rather using feature flags to test the different treatments. This is best implemented in addition to pre-production testing processes, and should not replace all testing.
Deployment vs. Release
In order to explain testing in production, we should first explain the difference between deployment and release.
Deployment means pushing a piece of new code live to the production environment. It does not mean it is actually handling production traffic. Deployment, as such, is a near-zero-risk activity to end users. Release, on the other hand, is the act of exposing end users to a new version. This is what actually impacts user experience, and as a result, it can be risky.
From these definitions, technically we should be saying “blue green release” instead of “blue green deployment”, and “canary release” instead of “canary deployment”, because what these tools are actually impacting what the customers see, not just what code is in production. And likewise, it’s not technically a bad deploy; it’s a bad release that causes outages and angry customers.
The Shortcomings of Staging
For software testing to be effective at predicting how a rollout will perform under the stress of production traffic, the test environment has to be as close to the production environment as possible. One way to attempt this is to maintain a staging environment, and try to keep it as in-sync with production as possible.
However, the trouble with this is that differences between staging and production systems occur regularly, often of necessity. For example, different instances of stateful systems like databases must be run in order to maintain data integrity. Further, the staging environment is commonly on a differently-sized cluster than production, with different configurations for things like load balancers and queues, and with less monitoring.
Again, most of this can be fixed or mitigated, but trying to do that necessitates having a group of software engineers spend a lot of time ensuring staging is as close to production as possible. And because production is constantly changing, this time has to be spent continuously.
This isn’t to say that staging is wholly unnecessary, or not useful. It is saying, though, that testing in staging should not be the only testing you do before you release to your end users. Some tests can be done perfectly fine in staging — but other tests work much better in production with production data. Ideally, testing in staging should be a precursor to testing in production. Neither one should replace the other.
How to Test in Production
One of the best ways to eliminate the risks of testing in production is with feature flags. Essentially, a feature flag is an if/then statement which is wrapped around a new feature, allowing the software development team to turn that feature on and off without deploying any code changes. For best practices of testing in production, check out this post.
Looking for a 30-60-90 day plan to implement testing in production? We got you covered here.
This prevents a lot of the worst failure modes of testing in production. It enables disaster recovery by providing a kill switch for each feature, allows for near-real-time monitoring of feature releases to check for performance degradation, and prevents testing from creating a poor user experience. Further, feature flags allow for more in-depth A/B testing, easy canary releases, unique benefits for DevOps organizations, and even increased observability.
With a feature flag management system like Split, it’s easier than ever to effectively manage production testing and gain the benefits of testing on real users with production data.