Skip to content

Learn

Testing in production: what it is and how to do it

In this post, I’ll explain what production testing means, how to implement, and why it might be the key to delivering reliable software.

Whenever I mention testing in production, I get that panicked look, like I’ve suggested skipping tests entirely. But the truth is, we’re already doing it. Your monitoring dashboards, those three a.m. alerts, even that support ticket about a weird bug—in a way, all are accidental tests in production. The issue isn’t testing in production; it’s doing it unintentionally and without safeguards.

Despite investing heavily in staging environments, we can never replicate real user behaviors, data volumes, or unexpected edge cases. I realized this while reading The DevOps Handbook. Facebook tests with internal users first, then beta testers, and then slowly rolls out changes globally. It’s controlled, measured, and actually reduces risk.

Testing in production isn’t reckless; it’s strategic validation in real-world conditions. In this post, I’ll explain what production means, how to implement it safely, and why it might be the key to delivering reliable software at scale.

The key difference between testing in production and just “hoping for the best” is control and observability

What is testing in production?

Let me start by saying something that might sound controversial. We’re all testing in production whether we want to admit it or not. I know that probably makes you uncomfortable, but hear me out. Testing in production isn’t about being reckless or deploying broken code. It’s about being honest about the reality of software development.

So what exactly is testing in production? It’s basically running tests and evaluating software changes directly in production, where real users are interacting with your system. Instead of relying solely on staging environments that we spend so much time trying to make “production-like”, you’re actually testing with real users, real data, and real traffic patterns.

Of course, this also means that you should responsibly handle data, ensuring that compliance requirements like GDPR, HIPAA, or the applicable regulations are respected.

Now, before you think I’ve lost my mind, let me clarify. This isn’t about skipping your integration tests in dev or throwing caution to the wind. Testing in production is an additional layer of validation, not a replacement for your existing test suite.

Think of it like this: You can have the most sophisticated staging environment in the world, but it’s still just an approximation of what production really looks like.

In production environments, you find the complexity that’s really impossible to replicate elsewhere or might be too costly. You’ve got real user behaviors, actual data volumes, network conditions you didn’t anticipate, and all those weird edge cases that only show up when thousands of people are using your software in unexpected ways.

The key difference between testing in production and just “hoping for the best” is control and observability. When you’re doing it right, you’re not exposing all users to potential issues at once.

You might want to start with 1% of traffic, or use feature flags to show new functionality only to certain users. If something goes wrong, the blast radius is small, and you can react quickly.

Testing in production: methods and types

So now that we’ve established what testing in production actually means, let’s talk about the different ways you can actually do it. There are several techniques that have evolved over the years, and honestly, some of them are pretty clever when you think about it.

1. A/B testing

The most straightforward approach is A/B testing. This is where you release two different versions of a feature to different groups of users and see which one performs better. Maybe you’re testing a new checkout flow or trying to figure out if a blue button converts better than a green one.

2. Canary releases

Canary releases are another technique that I really like. Why? Because you gradually roll out new features to small percentages of users, starting maybe with just 1% of traffic. If everything looks good, you bump it up to 5%, then 10%, and so on. The key here is that you can stop the rollout immediately if you see problems.

3. Feature flags

Another one is feature flags, which give you even more granular control. These are basically switches in your code that let you turn features on or off without deploying new code. What’s cool about feature flags is that you can target specific segments. You can combine feature flags with canary releases for even more control.

4. Blue-green deployment

Then there’s blue-green deployment, which is a bit different but still counts as testing in production. You maintain two technical production environments, one that’s live (let’s say “blue”) and one that’s not (“green”). You deploy your new version to the green environment, test it, and then switch all traffic over to green. If something goes wrong, you can instantly switch back to blue.

5. Load and stress testing

Load testing and stress testing in production are also worth mentioning. Sometimes you need to know how your system behaves under real production load, not just simulated traffic. In fact, you could maintain a constant 20% load on their production systems just to keep testing performance characteristics.

6. Chaos engineering

Finally, there’s chaos engineering, which sounds scary but is actually really valuable. This is where you intentionally introduce failures into your production system to see how it responds.

You don’t have to pick just one of these approaches. Most companies use a combined set of techniques depending on what they’re trying to accomplish and how much risk they’re willing to take on.

Production testing: process

Alright, so you’re probably wondering how to actually implement this without breaking everything. Let me walk you through the process I typically follow, and trust me, this took me a while to figure out through trial and error.

Before I even think about deploying something, I spend time defining what success looks like

Plan before testing

The first thing you need to do is plan everything out properly. I can’t stress this enough. You don’t just go and do it. Before I even think about deploying something, I spend time defining what success looks like. What metrics am I going to watch? What describes a failure that requires immediate rollback? Having these answers up front saves you from making panicked decisions later.

Implementation phase

Next up is the implementation phase, where you actually build your testing capabilities into the application. This means setting up feature flags, implementing monitoring hooks, and making sure your rollback procedures actually work.

I learned this the hard way too, to test your rollback before you need it. Then, when it comes to execution, I always start conservatively. My typical approach is internal users first, then maybe 1% production traffic, then gradually increase if all is good.

This Facebook model really influenced how I think about this, as they test with employees first, then small groups of external users, then gradually roll out globally.

Active monitoring

Monitoring during the process is absolutely critical. You need real-time visibility into what’s happening. I set up alerts for key metrics like error rates, response times, and business metrics relevant to what I’m testing. The monitoring has to be good enough that you can detect problems within minutes, not hours. Remember, you’re actively making decisions about whether to continue, pause, or rollback.

What I’ve found is that the process becomes much smoother once you’ve done it a few times.

Testing in production: benefits

Okay, so let’s talk about why you’d actually want to test in production. It sounds risky at first, but once you understand the benefits, it starts to make a lot more sense.

Accuracy

The biggest advantage is accuracy. When you test in production, you’re testing real data, real users, and real conditions. No matter how hard you try to make your staging environment mirror production, it’s never going to be exactly the same.

Faster feedback loops

Another huge benefit is faster feedback loops. Instead of waiting weeks or months to find out if users actually like a feature, you can get the feedback immediately.

Continuous deployment

Testing in production also forces you to deploy more frequently, which might sound counterintuitive but is actually a good thing. When you know you can test safely in production, you’re more likely to make smaller, incremental changes instead of these big bang releases that are harder to debug.

Cost savings

The cost savings are pretty significant too. Instead of maintaining expensive staging environments that try to mirror production infrastructure, you can test directly in production with proper safeguards. Plus, you catch issues faster.

When something goes wrong in production testing, real users experience it

Drawbacks of production testing

Now, let’s be realistic about the downsides, since there are some real risks you need to understand and plan for.

The most obvious risk is user impact. When something goes wrong in production testing, real users experience it. It’s stressful, and it can damage user trust if you’re not careful about how you handle issues. Data integrity is another serious concern.

Production testing means you’re working with real customer data, and if something goes wrong, you could corrupt or lose important information.

Complexity overhead becomes a real problem as you implement more production testing capabilities. Feature flags, monitoring systems, and rollback procedures—all of this adds complexity to your codebase and infrastructure.

Monitoring fatigue is something people don’t talk about enough. When you’re constantly watching dashboards during deployments, it gets exhausting.

Finally, there’s the psychological stress factor. Knowing that your tests could impact real users creates pressure on development teams. Some people thrive under that pressure, but others find it overwhelming.

Testing in production: best practices

Now let’s talk about how to do production testing without shooting yourself in the foot.

First and most important, start small and build confidence gradually. Don’t jump straight into testing your critical payment system with 50% of users. Start with something low risk, maybe internal tools or non-critical features. Use feature flags religiously. They give granular control over who sees what features.

Implement comprehensive monitoring before you test anything. You need to know immediately when something goes wrong. Set up alerts for error rates, response times, and key business metrics. Your monitoring tool has to be good enough to detect problems within minutes, not hours.

Have an automated rollback procedure ready. Set up automatic triggers based on key metrics. Document everything and establish clear processes. When things go wrong in production, you don’t want to be figuring out procedures on the fly.

And finally, maintain environmental isolation even in production testing. Use test accounts, separate your test data from real user data, and make sure your tests don’t interfere with normal operations.

Wherever possible, replace real data with masked or synthetic data to protect sensitive information. Thus, you can validate your application’s functionality while reducing compliance risks.

Remember, production testing is about being more careful, not less careful.

Conclusion

Testing in production still sounds scary to a lot of people. But here’s the thing—as Charity Majors puts it, “Testing in production is a superpower. It’s our inability to acknowledge that we’re doing it, and then invest in the tooling and training to do it safely, that’s killing us.”

The reality is that there’s truly no place like production when it comes to understanding how your software actually behaves in the real world. But that doesn’t mean you should be reckless about it. My conservative approach has always been about progressive delivery. Start small, build confidence, and gradually expand your testing scope.

It’s key to have the right tooling and process in place. You need good monitoring, automated rollback procedures, and a team that understands both the benefits and the risks. For instance, Tricentis provides test coverage and risk-based testing capabilities that give you the confidence to test safely in production.

The future of software development isn’t about avoiding production testing; it’s about doing it better.

This post was written by David Snatch. David is a cloud architect focused on implementing secure continuous delivery pipelines using Terraform, Kubernetes, and any other awesome tech that helps customers deliver results.

Author:

Guest Contributors

Date: Oct. 29, 2025

You may also be interested in...