SRE + performance engineering: Improve collaboration to release performant applications faster
Shift left with observability to release quality apps faster. Join...
Editor’s notes: Wayne Yaddow is an independent consultant with over 20 years’ experience leading data migration/integration/ETL testing projects at organizations including J.P. Morgan Chase, Credit Suisse, Standard and Poor’s, AIG, Oppenheimer Funds, IBM, and Achieve3000. Additionally, Wayne has taught IIST (International Institute of Software Testing) courses on data warehouse, ETL, and data integration testing. He continues to lead numerous ETL testing and coaching projects on a consulting basis. You can contact Wayne at wyaddow@gmail.com.
A characteristic of DW development is the frequent release of high quality data for user feedback and acceptance. At the end of each iteration of DW ETLs, data tables are expected to be of sufficient quality for the next ETL phase. This objective requires a unique approach to quality assurance methods and tools. Foremost it means integrating QA efforts and automation into ETL development iterations.
Essential to integrated ETL testing is test automation. Manual testing is not practical in a highly iterative and adaptive development environment. There are two key problems with manual testing.
First, it takes too long and is therefore a significant inhibitor to the frequent delivery of working software. Teams that rely primarily on manual testing ultimately end up deferring testing until dedicated testing periods, which allows bugs to accumulate.
Second, manual ETL testing is not sufficiently repeatable for regression testing. While seeking to embrace and adapt to change, we must always be confident that features that were “done, complete!” in previous iterations retain their high quality considering the changing systems.
Test automation requires initial planning and ongoing diligence, but once technical teams embrace automation, project success is more assured.
Data warehouse test automation is described as the use of tools to control 1) the execution of tests, 2) the comparison of actual outcomes to predicted outcomes, 3) the setting up of test preconditions, and other test control and test reporting functions. Commonly, test automation involves automating a manual process already in place that uses a formalized testing process.
Although manual ETL tests may find many data defects, it is a laborious and time-consuming process. In addition, manual tests may not be effective in finding certain classes of defects. DW test automation involves writing programs for testing that would otherwise need to be done manually. Once tests have been automated, they can be run quickly and repeatedly. This is often the most cost-effective method for a DW that may have a long maintenance life because even minor patches or enhancements over the lifetime of the warehouse can cause features to break which were working earlier.
Integrated automated testing in ETL development presents a unique set of challenges. Current automated testing tools designed for software development are not easily adaptable for database and ETL projects; large data volumes can make automated testing a daunting task. The wide variety of DW architectures further complicate these challenges because they involve multiple databases which require special coding for data extraction, transformations, and loading, also data cleansing, data aggregations, data enrichments.
Test automation tools can be expensive; the tools are usually employed in combination with manual testing. However, they can become cost-effective in the longer term, particularly when used repeatedly in regression testing.
What to automate, when to automate, or even whether one really needs automation are crucial decisions which the testing (or development) team must make. Selecting the correct features of the product for automation largely determines the success of the automation.
Automating unstable features or features that are undergoing changes should be avoided. Today, there is no known commercial tool or a set of methods / processes that can be said to represent comprehensive end to end DW testing. Leading ETL tool vendors from Informatica, Microsoft, and others do not promote any one test tool as the answer to ETL automated testing.
Automation of these tests may also be required due to the complexity of processing and the number of sources and targets that require checks. Without automation, these checks are often costly or in some cases impossible to do manually.
For most DW development, the ETL test processes are designed to check for and implement data quality. Therefore, selection of a high-function ETL tool (e.g., Informatica, SSIS, DataStage, in-house developed, etc.) is a serious concern for the DW project.
The trick is to figure out what needs to be automated, and how to approach this task. An array of questions must be considered in the course of automating tests such as:
Figure 1 shows examples of time durations for manual vs. automated test cases for a sample set of test cases.
Figure 1. Comparing manual testing execution time vs. testing execution time
A primary objective of automated DW testing is coverage of the most critical functions of loading a DW – synchronization and reconciliation of source and target data.
Not all ETL testing is suitable for automation. Assess the situations mentioned above to determine which types of automation would benefit the testing process and how much is needed. Evaluate testing requirements and identify inefficiencies that may be fixed with automated testing. QA teams that spend a lot of time on regression testing will benefit the most.
Develop a business case for automated testing. Automated testing generally comes at an additional cost over manual testing, so in order to convey the value to the business, IT must first make the case.
Evaluate the options. After evaluating the current state and requirements within the IT department, investigate which tools fit the organization’s testing processes and environments. Options may include vendor, open source, in-house, or a combination of the tools.
A principle of agile and other modern development is automated testing. We can apply this awareness to DW.
An important DW testing consideration is that the number of tests that are run will continue to grow to verify added functionality.
Figure 2. End-to-end ETL testing
As shown in Figure 2, there are four major areas of end-to-end ETL testing. For data automation testing, the emphasis at each test entry point is the validation of data integrity.
When implementing test automation, data can be tracked from the source layer, through ETL processing, to loads in the DW, then finally to the front-end applications or reports. If corrupt data is found in a front-end application or report, the execution of automated suites can help to more rapidly determine whether the problem is located in the data source, an ETL process, in a DW database/data mart or in the business intelligence reports.
For performance testing, the same test entry points are utilized to focus on characterizing subsystem response under load. A similar strategy is used for data integrity validation to determine the origin of performance issues. Subsystem performance can be measured at any of the identified test entry points.
An emphasis on rapid localization of either data and performance problems in complex DW architectures provides a key tool for promoting efficiencies in development, for shortening build cycles, and meeting release targets.
Automated testing tools are most helpful when used in the following circumstances.
As test automation has rapidly become an essential alternative to manual testing, more companies are looking for tools and strategies to successfully apply automation. This trend has led to the significant growth in test automation tools based on Selenium, Katalon Studio, Appium, and many more.
However, to fully utilize these automation tools, DW and BI QA teams must possess appropriate programming skills. Adding to that. businesses are becoming more agile and applications are updated more frequently.
Many IT experts have predicted that ongoing, the knowledge gap between the testers and developers must and will be narrowed. Automated ETL testing tools can significantly reduce the amount of time spent testing code in comparison to traditional manual methods. Other benefits include the creation of reusable code and a reduction in costs associated with personnel and rework.
Gain knowledge about automated DW testing and tools to decide if it’s right for your QA project.
Shift left with observability to release quality apps faster. Join...
Explore common regression testing challenges faced by Agile teams –...
Ensure reliable, scalable application performance across on-prem,...
Ensure SAP data accuracy & reliability. Learn risks, key...
Watch this webinar to learn some advanced strategies for...