Editor’s note: Wayne Yaddow is an independent consultant with over 20 years’ experience leading data migration/integration/ETL testing projects at organizations including J.P. Morgan Chase, Credit Suisse, Standard and Poor’s, AIG, Oppenheimer Funds, IBM, and Achieve3000. Additionally, Wayne has taught IIST (International Institute of Software Testing) courses on data warehouse, ETL, and data integration testing. He continues to lead numerous ETL testing and coaching projects on a consulting basis. You can contact him at firstname.lastname@example.org.
Each data warehouse contains large amounts of data that the business uses for strategic decision making. Source data is gathered from heterogeneous internal and external data sets, then populated and queried to provide for business intelligence reporting.
To ensure the integrity of this data and the decisions that rely upon it, data warehouse testing should be first planned project-wide then carefully implemented. Such testing is critical for removing the inconsistencies that occur due to data being collected from widely different sources and formats. If left uncorrected, these inconsistencies are likely to affect the resilience of the many ETL processes.
Broad Categories of DWH/BI Testing
DWH/BI testing typically falls into the following categories:
- Project requirements — completeness validations
- Developer unit testing
- Component and integration testing across the broad end-to-end DWH spectrum
- Acceptance testing (e.g., BI report testing).
Types of DWH/BI Component and Integration Testing
In this writing, I focus on only “component and integration testing” from the above categories. Component and integration testing typically verifies the compatibility of DWH modules in end-to-end ETL streams.
Component and integration testing is often accomplished in incremental order consisting of multiple stages after integration of every component to ensure that they work together. The process of incremental integration testing of components continues until the full functional system is obtained as required.
Integration tests often involve running end-to-end ETL routines that may invoke dozens of components. To ensure that all component processing completed as expected, the developers and QA team will want to determine whether the correct number of data mapping ETL’s executed and whether key business rules were properly applied. In other words, they will want the integration test to repeat many of the unit and individual component tests.
DWH component and integration testing can include the following types of tests:
- Functional and Data Validation Tests – Checking to ensure data is loaded into the system according to the business rules.
- Data Integration Tests – Assessing the initial data loaded, as well as incremental data that continues to be loaded in real-time or frequently due to modifications, updates, and transformations. In this phase, the quality of the system is tested from the lowest level.
- BI Report Tests – Verifying/reconciling report data with the (often transformed) data source.
- User Acceptance Tests – Verifying that requirements and business rules have been met— confirming that the system is acceptable to the client.
- System Tests – Checking the resultant system, as a whole, for errors and correct modifications.
- Regression Tests – Verifying that the system is functioning properly after functional (code) and data updates have been applied.
- Performance Tests – Verifying the effectiveness of the system under specific workloads.
- Usability Tests – Having users interact with the system to evaluate its ease of use and intelligibility.
- Recovery Tests – Assessing how the system recovers from crashes, failures, etc.
- Security Tests – Ensuring data in the system is secure under the required working conditions.
- Stress Tests – Evaluating system performance under heavy workload
The Challenges of DWH/BI Quality Assurance
The challenges of ETL testing are substantially different from conventional software testing. For example, top challenges include:
- Incompatible and duplicate data
- Loss of data during ETL process
- Unavailability of a fully inclusive test bed
- Testers with inadequate skills and privileges to execute ETL jobs
- Volume and complexity of data is likely vast
- Faults in business process and procedures
- Trouble acquiring and building test data
- Missing business flow information
A dedicated QA team with extensive ETL test process and tool knowledge can help your team address these challenges as they plan and implement the above types of testing. As the graphic below shows, the QA team would be developing the test strategy, categories of important tests, planning and implementing test automation tools and process, test scenarios, test cases, and more. Doing so, business analysts, data analysts, developers, etc., will be freed to carry out their assigned critical tasks.
Improving the quality of reported data and meeting the business intelligence requirements for your project is essential for building trust in the tested data warehouse and business intelligence reports created from it.