Clearly the answer is – No. Yet we “enterprise software developers & testers” take this type of a business risk everyday by releasing software that is not completely tested to production.
We test what we know has changed. Change is defined by requirements, product backlogs and sprint backlogs. Requirements lead to code and test case creation. Once there are test cases, we know how to test them, create automation scripts for them and do regression testing etc. During this process, we may also be able to test for change and change impact. Change impact is subtle; it is important to understand impact of change on other parts of the application and test for it. This is not as easy or straight forward as testing for change which is more easily quantified. With experience, developers and testers learn to assess the change impact and test for it.
What we don’t test are the known unknowns. What are known unknowns? These are things for which we may not have the requirements, if we do they are vague. The test cases don’t exist.
For example, we know that we cannot define and test our apps on all possible mobile device, OS, browser combinations for mobile apps. Usual practice is to define a set of most commonly used combinations and test for them. The challenge is when there is a change in the device, OS and/or browser software that creates a combination that we have not tested. These changes are happening all the time – usually completely unknown to us, because device, OS and browser vendors are working independent of each other and have their own product road-map to follow.
Another example is, all our tests are done in the comfort of a lab setting where we get good signal, bandwidth and few interruptions during our testing. When we do our mobile tests for, we are not running to catch a flight, shopping at a store or playing games on the device and talking or listening to the music and trying to use the app. Most of the time these factors are ignored during testing.
We also know that power outages and other type of disruptive events can take our application down, yet we don’t test for these types of things on a regular basis.
We do most of our testing prior to production, in test and staging environments that are not the same as production. The data sets, integration’s to internal and external systems etc. are very different between test, staging environments and production environments. These differences mask issues prior to production deployment and surface serious issues when software is deployed to production.
Now that we have identified these factors (the known unknowns) that can impact overall software quality, what can we do about it?
We need to have a strategy to test more of the known unknown’s. One approach is to test the software in production environment with real world users using their own devices. “Testing in production” is considered blasphemy by traditional testing community. We have to re-think this old belief and question the assumptions. Today there are many leading enterprises doing “testing in production” and improving the quality of their software.
What does in production testing mean?
In-production testing means we extend testing phase beyond testing in staging. As we move the software through code promotion path, we deploy it to production in a controlled manner. This really means, only limited set of users can experience the new deployed release. This limits any downside exposure and allows testing to be done without impacting every user in production. In this mode, tests for some of the known unknowns can be conducted. Once there is confidence, the roll-out can be expanded to more users still in a controlled manner, until a point is reached when the software is ready for general use (GA).
Are there tools and techniques to support this approach?
There are several tools and techniques that are available to support in-production testing. There are tools to support controlled roll-out, and real world testing. This is a multi-part blog and there will be more detailed discussions on the tools and techniques in the subsequent blogs.
Who “tests in production” today?
Most unicorns do this today. Enterprises with mature DevOps practices do this as well. That is how they can deliver higher quality despite accelerated software churn. We will explore a real-life example on how they do it and what are the lessons learned in subsequent blogs.
What type or class of software is the candidate for this type of testing?
The most obvious candidates are any web and/or mobile applications that is targeted for end users, revenue generating or servicing the needs of community at large.
Key takeaway: It is important to evaluate and assess the quality of software your enterprise is delivering to your customers today. That will answer the key question – how are you handling your known unknowns. In subsequent blogs we will look at the tools, techniques that support this approach. We will also learn from practitioners who are successfully using this approach.