Reason for Topic
When testing software, it’s not enough to simply know how the app should function; apps and services often behave differently based on input. The data used in testing is often equally if not more important than testing all the possible paths people can take to interact with apps.
However, obtaining representative and accurate data for testing can often be a time-consuming and costly effort. For new features and capabilities, there is often no production examples; but even if there are, sanitizing sensitive data from production to use in a lower environment is fraught with challenges too. This is a very old and persistent problem in software development: where do you get the ‘right’ data for testing, and how do you even know what ‘right’ means?
Introduction / Definition
On-demand test data is a term to describe the outcome of a process to obtain the right data for testing in an appropriate amount of time (in this case in close-to-real-time) and there are multiple ways to achieve this depending on the situation.
For situations where a high degree of accuracy as compared to production data and usage is needed, a process of ‘sanitization’ is often employed. This often involves a combination of both automated/autonomous identification and masking of sensitive data elements as well as controls that allow teams to explicitly define known issues, data types, and compliance requirements are.
For situations where using production is not preferred or an option at all (such as new features not yet released to production users), ‘synthetic data’ can be used to automatically generate data sets for use in testing. Usually based on a model of the data desired, synthetic data generation is fast and repeatable, lending itself to continuous software delivery and especially automated pipelines. This is critical for modern software teams, since everyone has need for test data to some degree or another.
Benefits & Examples
A major benefit of on-demand test data is that it can be tailored to specific testing needs. With manual test data, testers may not always have control over the data that is used, which can limit the scope of testing. On-demand test data, on the other hand, can be generated to include specific scenarios, such as edge cases or unusual user behavior, that may not be easily replicated with static data. This can help improve the accuracy and thoroughness of testing, leading to more reliable software.
On-demand test data is also more cost-effective than manually generated test data. Creating and maintaining a large dataset of manually generated test data can be time-consuming and expensive, particularly as the size and complexity of the software being tested increases. On-demand test data, by contrast, can be generated quickly and easily, using algorithms and automation tools. This can help reduce the time and resources needed to carry out testing, making it a more efficient and cost-effective process overall.
Drawbacks / Gotchas
No matter how much of both are in place, there’s always a non-zero risk that something will leak through the process, which therefore requires additional monitoring and auditing resources.
The classic approach to synthetic data requires manual design and testing itself, which can be another time-consuming and highly specialized skill set. More recent developments use AI-assisted process monitoring, such as during manual testing, to automatically recommend and develop test data models reflecting actual use augmented by intelligent guesses as to what data is needed.
Overall, on-demand test data offers a range of advantages over static, manually generated test data. By enabling faster, more tailored, and more cost-effective testing, it can help software development teams deliver higher-quality software more quickly and efficiently.